An owner I spoke with recently shared an AI-summarised industry report at her board meeting. Three statistics, neatly footnoted, dropped into the pre-read. Halfway through the discussion her chair flagged one of the numbers as wrong. She checked the source. The figure didn’t exist. Neither did the report it claimed to come from. She felt the temperature in the room change as she explained what had happened, and spent the next week emailing each director the corrected paper.
That moment is the reason this post exists. Much of the popular coverage of AI hallucinations treats them as a comedy story, the lawyer who cited fake cases, the chatbot that swore at a customer. For a firm that puts AI output in front of clients, into financial documents, into legal correspondence, or onto a public website, hallucinations are a specific business risk class with measurable exposure and proportionate mitigations. This post sets out what the risk actually is, where it lands in your business, and what you need to decide about it explicitly rather than by default.
What is an AI hallucination, mechanically?
An AI hallucination is a confident, fluent piece of output that has no basis in reality. Language models work by predicting the next word based on statistical patterns learned during training, rather than by looking facts up. When those patterns generate something plausible-sounding but false, the model has no internal alarm that fires. The output reads as authoritative because uncertainty was never represented in the answer.
OpenAI’s 2024 research on why language models hallucinate identified a deeper cause. Standard training rewards guessing over admitting uncertainty. If a model answers “I don’t know”, it scores zero on the benchmark. If it guesses and happens to be right, it scores a point. Across thousands of test questions, models that guess look better than models that abstain, so the training signal pushes the model toward confident invention. The fluent wrong answer is the system doing exactly what it was trained to do, rather than a defect in the wiring.
For a fuller plain-English explainer of the mechanics, see what is an AI hallucination. The point for this post is that the behaviour is structural, not a defect the next model release will fix.
Why is this a business risk that isn’t going away?
A common assumption in small businesses meeting AI for the first time is that newer, larger models hallucinate less, so the answer is to upgrade. The data does not support that hope. Vectara’s hallucination leaderboard tracks rates across frontier models and finds them ranging from 0.7% to 13.6% depending on the model and task, with no clean trend line downward across the last two years.
On harder knowledge benchmarks the rates climb sharply. OpenAI’s own GPT-5 with web access achieved a 9.6% hallucination rate on one test, but the same model without internet access jumped to 47%.
Retrieval-augmented generation, where the model is given a set of source documents to draw from, is widely promoted as the solution. It helps, materially. It does not eliminate the problem. Stanford’s RegLab study of legal AI research tools found that two systems marketed as “hallucination-free” through retrieval-augmented generation still hallucinated between 17% and 33% of the time. The model can still misread the source, ignore conflicting evidence, or invent a citation when retrieval comes up empty. The practical implication for an SME is simple. You cannot buy your way out of this with a vendor switch or a feature upgrade. The risk persists across the market and has to be governed in your firm rather than waited out.
Where does the exposure actually land in your business?
Hallucinations fall into four categories that map directly onto how a small business uses AI in practice. Invented statistics get pasted into pitch decks, board reports and proposals. Invented quotes end up in marketing copy or LinkedIn posts. Invented citations point to studies, court cases or regulations that turn out not to exist. And invented legal or regulatory provisions appear as confident statements that a rule says X when it does not.
The damage concentrates in five surfaces of your business. Client deliverables, where a fabricated number in a report becomes a professional liability. Public-facing copy on your website or blog, where an invented quote becomes a defamation risk and a credibility problem when a reader checks. Regulatory filings, where an invented citation can void the filing and expose the firm to enforcement. Board reports, where bad data corrupts decisions you will spend months unwinding. And advice to staff, where an AI tool confidently tells a junior colleague the wrong thing about employment law or tax treatment, and they act on it. The Air Canada chatbot case in 2024 established that the firm owns its own AI’s output. The tribunal rejected the airline’s argument that the chatbot was a separate entity. If your customer-facing AI says something wrong about your pricing, your refund policy, or your service terms, you are on the hook for it.
When should you ask hard questions about AI output?
The honest answer is whenever the output will leave your firm or change a decision that matters. Internal scratchpad work where a hallucination would be caught instantly is one risk profile. A draft email to a client about their tax position is a different one. A compliance-response summary is different again. The trigger for serious review is the cost of being wrong about that specific piece of work.
A simple test before any AI-generated output goes out. Could this become a legal or regulatory problem if a specific claim in it is wrong? Could this reach a client, a regulator, an investor, or the public? Could a colleague act on this advice without double-checking? If the answer to any of those is yes, you need a human reading it before it leaves the building, and that human needs enough domain knowledge to catch a subtle misstatement, not just a typo. For everything else, including internal first drafts, summarising long documents you’ll read anyway, brainstorming structure for a piece of work, the risk is lower and lighter checks suffice. Get this distinction written down so your team isn’t guessing.
What are the related concepts worth knowing?
Three governance ideas sit alongside hallucination risk and are worth knowing as you set your own controls. A proportionate AI risk register is a one-page list of every AI tool the firm uses, what its output touches, and what controls apply. Retrieval-augmented generation is the most useful technical control for high-stakes use cases. An audit trail records which tool was used, what prompt was given, and who reviewed the output.
For a fuller treatment of the register, see a proportionate AI risk register for a 5 to 50 person business. On retrieval-augmented generation, treat vendor claims of “hallucination-free” with the same scepticism you’d apply to any vendor superlative. The Stanford evidence is clear that retrieval helps but does not eliminate the problem. On audit trail, regulated firms have no choice. For everyone else it is the first thing that protects you when something goes wrong, and it costs almost nothing to set up.
The single decision every owner needs to make explicitly is which categories of AI output go out of your firm without a human reading them, and why. A meaningful share of firms have never made this decision in writing. They’ve drifted into a position where the answer is “most of it, by default”, which is the wrong answer. The right answer is small, deliberate, and on paper. If you’d like help getting that on paper for your firm, book a conversation.



