Invented AI stats and fake citations: an owner's guide

A managing director I spoke with last month had a quiet realisation a fortnight after winning a six-figure pitch. Her business partner had quoted a McKinsey statistic in the deck, a precise one, the kind that lands well on slide three. She had asked, on the train home, where it was from. He could not remember. They went back to the source. The statistic was not in any McKinsey publication. It had been generated by an AI tool a week earlier and dropped into the deck without a check.

Nothing bad happened. The client never asked. But she has been quietly uncomfortable about it ever since, because she knows the next one might land in front of a CFO who does ask.

What does AI actually invent, and in what shapes?

AI tools invent in three reliable shapes. Statistics with confident precision, quotes attributed to named people, and citations to sources that do not exist. The reason is structural rather than incidental. Language models generate the next plausible word, they do not look facts up, and the three shapes share a profile of high authority and low traceability, which is why they cluster.

A precise figure carries the look of evidence. A 23.5% growth rate reads as more credible than “significant growth” even when it was generated to fit the slide rather than drawn from real data. A named quote sounds like testimony. A citation in the right shape looks like proof. The model is fluent in all three forms because they appear constantly in its training data, so it generates new versions of them with high confidence regardless of whether the underlying claim is true.

The scale is not trivial. A systematic review across six studies found approximately 51% of AI-generated citations were entirely fabricated. A 2024 analysis of ChatGPT medical references found only 7% were both real and accurately cited, with the remainder either invented outright or attached to real papers that did not make the claim. The Stanford HAI 2026 AI Index shows hallucination rates across 26 leading models in 2025 ranging from 22% to 94% depending on task. The models that fabricate least still fabricate, and they do it on the forms that look most authoritative, which is exactly where the damage lands.

Why does this matter more for a small firm than a large one?

A small firm cannot absorb the reputational cost of a single public error the way an enterprise can. A 200-person business can quietly issue a correction and move on. A 20-person services firm cannot. One discovered fabrication in a pitch, a proposal, or a published article can lose a client, sink a deal, or trigger a complaint that takes a quarter to recover from.

The legal direction of travel reinforces this. In Mata v Avianca, two US lawyers were fined $5,000 for submitting a brief with ChatGPT-invented case citations. A California court later fined two firms a combined $31,000 for the same pattern. The principle in both rulings was identical, the professional who signed the work bears responsibility, not the AI vendor, not the tool, not the assistant who ran the prompt. Professional indemnity insurance typically does not cover claims arising from unverified AI output, because the named author failed in their duty of care. Courts have made clear that AI use does not absolve professionals of accuracy obligations, and small firms have the same duty as large ones with much less margin for getting it wrong.

Where in your business will you actually meet this?

The risk concentrates in three places, in roughly that order of cost. Pitch decks and funding proposals come first. Thought-leadership content published under your name comes second. Client deliverables and proposals come third. The shared feature of all three is that the work leaves the firm carrying cited claims, and the audience has both the ability and the motivation to check.

Pitch decks are first because an invented market-size figure, a fabricated competitor benchmark, or a regulatory reference that does not exist can lose a deal outright or create liability when the claim is later discovered. Thought-leadership content is second because a single invented quote or non-existent study, once spotted by a sophisticated reader, destroys the authority the article was built on. Client deliverables are third because an analysis citing a regulation that is not real or a financial benchmark that is invented leaves the firm liable for the error even though the AI generated it.

Internal drafts, brainstorming documents, and background research sit in a different category. The fabrication risk is still there, but the cost of being wrong is contained inside the team. The job is to mark the threshold clearly. Anything moving from internal to external should pass a check. Anything that stays internal does not need one until it is promoted into client-facing work, at which point it joins the same routine the rest of the firm follows.

When should you ask, and when can you ignore?

Verify when the cost of being wrong is high and the audience has both the ability and the motive to check. Ignore the verification step when the content stays inside the team and never gets quoted externally. The threshold is roughly three minutes per cited claim, which is the cheapest insurance policy a small firm can buy on its own credibility.

The three-minute routine has three checks. First, source existence: open the original report or paper and confirm the number, quote, or citation is really there. Second, date confirmation: AI often cites real sources with invented publication years that create a false sense of recency, so check the date matches a real publication. Third, attribution accuracy: this is where errors most commonly hide, because the source is real but the claim is misrepresented, qualified differently, or directly contradicted by the original text. The first two checks take thirty to sixty seconds each. The third takes a little longer because it requires reading the relevant section, not just confirming the source exists.

Run the routine on every cited claim in pitch decks, in published thought-leadership under your name or the firm’s, in client deliverables, in regulator submissions, and in anything quoted to the press. Skip it on internal drafts, exploratory notes, and AI-assisted research that shapes your thinking but is not quoted externally. The rule that holds the line long-term is simple and worth writing into the team’s quality standard. Any cited fact in client-facing work is the responsibility of the named author, not the AI. If you do not have time to verify it, you do not have time to send it.

The closest companion pieces sit alongside this one in the evaluating AI output cluster. What is an AI hallucination covers the underlying mechanism that produces fabricated statistics, quotes, and citations. Hallucinations as a business risk frames the proportionate-controls argument at a firm level rather than at the level of an individual piece of content.

Read those for the wider picture. This post is the field guide for the specific moment when a piece of AI-generated work is about to leave the building with a cited claim in it, and someone has to decide whether the claim has been checked. The answer is always the same. If the work carries the firm’s name and an audience that might verify, three minutes of checking is cheaper than any other outcome.

If you want to talk through where the verification threshold should sit in your firm, and how to write it into the team’s quality standard so it sticks, book a conversation.

Invented stats, fake quotes, made-up citations: an owner's field guide

Key takeaways

What does AI actually invent, and in what shapes?

Why does this matter more for a small firm than a large one?

Where in your business will you actually meet this?

When should you ask, and when can you ignore?

Sources

Frequently asked questions

How often do AI tools actually invent citations and statistics?

Do I need to verify every piece of AI-generated content I produce?

What about specialised tools that claim to be hallucination-free?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Invented stats, fake quotes, made-up citations: an owner's field guide

Key takeaways

What does AI actually invent, and in what shapes?

Why does this matter more for a small firm than a large one?

Where in your business will you actually meet this?

When should you ask, and when can you ignore?

Related concepts

Sources

Frequently asked questions

How often do AI tools actually invent citations and statistics?

Do I need to verify every piece of AI-generated content I produce?

What about specialised tools that claim to be hallucination-free?

Ready to talk it through?

Related reading

Quality signals over time, how to spot when AI output is drifting

The two-person review threshold, when single-check AI evaluation is not enough

Sampling rates for AI output, what the volume should drive

If any of this sounds familiar, let's talk.