Checking whether AI citations actually support the claim

A team member uses ChatGPT to pull together background research for a client proposal. The output arrives with five citations, all linking to credible-looking sources. The proposal goes out. Two days later the client responds. One statistic does not appear in the linked report at all, and a second URL leads to a document that contradicts the claim entirely.

This kind of mismatch is documented enough that UK regulators, the AI vendors themselves, and independent researchers have all written about it explicitly. The pattern has a name, and it happens with enough regularity to warrant a simple default rule. Treat every AI citation as provisional until you have confirmed it against the source.

What does it mean to check whether a citation actually supports the claim?

When an AI tool provides a citation, it may be fabricating the source entirely, linking to a real document that says something different, or referencing material from the wrong jurisdiction or date. Checking whether a citation supports the claim means going beyond confirming the URL loads. It means confirming the specific number, conclusion, or assertion attributed to that source actually appears in it.

Controlled studies have found fabrication or mismatch rates between 20% and 47% depending on the domain and how the question is posed. A 2023 study tested ChatGPT on medical questions and found that 20 to 30% of references it provided were either fabricated or failed to support the claim made. A 2024 evaluation found false citation rates as high as 47% when models were asked to supply references in unfamiliar domains.

The AI vendors themselves are direct about this. OpenAI’s usage policies state that outputs may be inaccurate and recommend human review before relying on content for factual purposes. Microsoft’s Copilot documentation warns that it can “get things wrong” and that users should verify important information independently. Both warnings reflect a fundamental limitation in how language models produce text. When the companies selling these tools recommend independent verification of every output, the recommendation is grounded in how the systems actually work.

Why does this matter for your business?

An unchecked AI citation creates three types of risk:

Operational. A decision built on a wrong number takes your team in the wrong direction.
Reputational. A client who clicks the source and finds a mismatch loses confidence quickly.
Regulatory. UK law places the accuracy obligation on the organisation using the AI output, not on the tool that produced it.

The UK Information Commissioner’s Office is explicit on this point. Its guidance on AI and data protection states that organisations must ensure AI outputs are “sufficiently accurate for their intended purpose” and must maintain processes to detect and correct errors. The guidance confirms that relying on AI “does not remove or reduce your accountability obligations,” and that organisations must maintain documentation showing how AI outputs were checked before being used in decisions affecting individuals.

For owner-managed businesses in regulated sectors the exposure is higher. The FCA’s guidance on machine learning in financial services confirms that firms remain fully responsible for the accuracy of any AI-driven analysis or advice provided to customers. An AI citation in a client communication is an organisational output, not a technical artefact, and regulators treat it accordingly.

The 2023 US case Mata v. Avianca illustrated the consequences at their sharpest. Two lawyers filed a court brief containing six non-existent cases generated by ChatGPT. The judge sanctioned them US$5,000 for submitting “non-existent judicial opinions with fake quotes and citations.” UK law firms have since issued internal guidance forbidding unchecked AI research for court submissions. The jurisdictional distance does not reduce the relevance of that lesson.

Where will you actually encounter this?

Citation checking becomes relevant any time your team uses a general-purpose AI tool to produce content that includes references, statistics, or regulatory details. The most common situations in owner-managed businesses are client-facing documents, internal reports used to justify decisions, staff-facing guidance that cites law or policy, and any marketing content that references research or sector data.

A 2024 CIPD survey found that 20% of UK employers were already using generative AI tools at work, while only 19% had provided any guidance or training on their use. That gap is where citation problems develop. Staff are producing research-backed material without a shared understanding of whether the references hold up under scrutiny.

The National Cyber Security Centre’s guidance on using public generative AI safely recommends treating all AI outputs as “unverified” by default, and checking critical information against trusted sources, particularly where legal, financial, or security implications are present. The NCSC frames citation checking as security hygiene, not a quality nicety.

Retrieval-augmented generation tools, where the AI draws answers from a specific document set, reduce citation risk but do not eliminate it. Even when a tool is working from your own knowledge base, it can misstate what a document says. The claim and the source still need to be reconciled by a human before anything goes external.

When do you need to check, and when can you reasonably take a lighter approach?

The answer depends on what the output is used for and who sees it. External communications, anything influencing a decision about a person or a regulated product, and anything taken as authoritative by a client all require a full citation check. Internal drafts used as a starting point for further research can carry lighter-touch review, provided the team knows the citations are provisional.

A practical citation check has four steps. Click every link and confirm the URL loads and is the type of source the AI described. Search within the page (Ctrl+F or Cmd+F) for the specific number, phrase, or conclusion the AI attributed to it. Check the publication date and jurisdiction. A UK regulatory claim needs a UK source, and guidance from several years ago may not reflect current rules. Document any corrections you find, as this gives you a record if a decision is ever questioned and supports your accountability obligations under UK GDPR.

For high-stakes contexts, an AI tool can assist with the initial verification pass. Share the paragraph and the source document with a tool and ask it to locate where the specific claim appears. Use the response as a starting point. Final verification rests with a human reading the primary source.

A simple internal rule covers the large majority of cases. All AI-generated citations and factual claims must be checked against primary sources before anything goes external. For owner-managed businesses this does not require a compliance function. It requires a shared habit.

What connects to citation checking in a broader output evaluation practice?

Citation checking is one part of a broader evaluation discipline for AI output. It connects to the practice of spotting AI outputs that are confidently wrong, to understanding when AI-generated numbers and statistics cannot be trusted, and to the question of what a proportionate review workflow looks like when your team is handling different types of AI output at volume.

The ICO and the FCA both point toward maintaining an audit trail, being able to demonstrate how AI outputs were reviewed before they influenced decisions, not just that individual outputs were spot-checked. The Competition and Markets Authority’s 2023 review of foundation models noted that inaccurate AI outputs can harm consumers, and that developers and deployers share responsibility for ensuring accurate information reaches users.

For owner-managed businesses with operations into the EU, the EU AI Act introduces documentation requirements for high-risk AI systems. If your business offers AI-driven services to EU clients, the documentation expectations around how your system sources and cites evidence are likely to be more demanding than current UK requirements alone.

If your team is using AI tools heavily enough that manual citation checking is becoming a bottleneck, the right conversation is about a structured output evaluation workflow. That conversation starts with the same discipline, confirming the source actually contains what the AI reported it contains, before the output leaves your business.

Checking whether AI citations actually support the claim

Key takeaways

What does it mean to check whether a citation actually supports the claim?

Why does this matter for your business?

Where will you actually encounter this?

When do you need to check, and when can you reasonably take a lighter approach?

What connects to citation checking in a broader output evaluation practice?

Sources

Frequently asked questions

How often do AI tools get citations wrong?

If I use an AI-generated citation that turns out to be wrong, who is responsible?

What is the quickest way to check whether an AI citation actually supports the claim?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Checking whether AI citations actually support the claim

Key takeaways

What does it mean to check whether a citation actually supports the claim?

Why does this matter for your business?

Where will you actually encounter this?

When do you need to check, and when can you reasonably take a lighter approach?

What connects to citation checking in a broader output evaluation practice?

Sources

Frequently asked questions

How often do AI tools get citations wrong?

If I use an AI-generated citation that turns out to be wrong, who is responsible?

What is the quickest way to check whether an AI citation actually supports the claim?

Ready to talk it through?

Related reading

AI theatre or real progress: how a founder tells the difference

How safe is AI for business use, and where do the risks sit?

How accurate is AI translation for business documents?

If any of this sounds familiar, let's talk.