Hallucinations as a business risk, not a curiosity

A business owner and a colleague checking a printed AI-generated report at a boardroom table.
TL;DR

An AI hallucination is a confident, fluent output that has no basis in reality. For a small business that puts AI work in front of clients, in financial documents, or in regulatory filings, hallucinations are a specific risk class with measurable exposure. They aren't going away with the next model release, so the response is governance, not a vendor switch. Inventory your AI uses, classify them by exposure, and put human review where the stakes justify it.

Key takeaways

- AI hallucinations are a structural feature of how language models work, not a glitch the next release will fix. - The four categories that hurt small businesses most are invented statistics, invented quotes, invented citations, and invented legal or regulatory provisions. - Exposure concentrates in client deliverables, public-facing copy, regulatory filings, board reports, and advice to staff. - The Air Canada chatbot ruling in 2024 confirmed firms own the output of the AI tools they deploy, even when those tools invent policy. - The mitigation question every owner needs to answer in writing is which categories of AI output go out without a human reading them, and why.

An owner I spoke with recently shared an AI-summarised industry report at her board meeting. Three statistics, neatly footnoted, dropped into the pre-read. Halfway through the discussion her chair flagged one of the numbers as wrong. She checked the source. The figure didn’t exist. Neither did the report it claimed to come from. She felt the temperature in the room change as she explained what had happened, and spent the next week emailing each director the corrected paper.

That moment is the reason this post exists. Much of the popular coverage of AI hallucinations treats them as a comedy story, the lawyer who cited fake cases, the chatbot that swore at a customer. For a firm that puts AI output in front of clients, into financial documents, into legal correspondence, or onto a public website, hallucinations are a specific business risk class with measurable exposure and proportionate mitigations. This post sets out what the risk actually is, where it lands in your business, and what you need to decide about it explicitly rather than by default.

What is an AI hallucination, mechanically?

An AI hallucination is a confident, fluent piece of output that has no basis in reality. Language models work by predicting the next word based on statistical patterns learned during training, rather than by looking facts up. When those patterns generate something plausible-sounding but false, the model has no internal alarm that fires. The output reads as authoritative because uncertainty was never represented in the answer.

OpenAI’s 2024 research on why language models hallucinate identified a deeper cause. Standard training rewards guessing over admitting uncertainty. If a model answers “I don’t know”, it scores zero on the benchmark. If it guesses and happens to be right, it scores a point. Across thousands of test questions, models that guess look better than models that abstain, so the training signal pushes the model toward confident invention. The fluent wrong answer is the system doing exactly what it was trained to do, rather than a defect in the wiring.

For a fuller plain-English explainer of the mechanics, see what is an AI hallucination. The point for this post is that the behaviour is structural, not a defect the next model release will fix.

Why is this a business risk that isn’t going away?

A common assumption in small businesses meeting AI for the first time is that newer, larger models hallucinate less, so the answer is to upgrade. The data does not support that hope. Vectara’s hallucination leaderboard tracks rates across frontier models and finds them ranging from 0.7% to 13.6% depending on the model and task, with no clean trend line downward across the last two years.

On harder knowledge benchmarks the rates climb sharply. OpenAI’s own GPT-5 with web access achieved a 9.6% hallucination rate on one test, but the same model without internet access jumped to 47%.

Retrieval-augmented generation, where the model is given a set of source documents to draw from, is widely promoted as the solution. It helps, materially. It does not eliminate the problem. Stanford’s RegLab study of legal AI research tools found that two systems marketed as “hallucination-free” through retrieval-augmented generation still hallucinated between 17% and 33% of the time. The model can still misread the source, ignore conflicting evidence, or invent a citation when retrieval comes up empty. The practical implication for an SME is simple. You cannot buy your way out of this with a vendor switch or a feature upgrade. The risk persists across the market and has to be governed in your firm rather than waited out.

Where does the exposure actually land in your business?

Hallucinations fall into four categories that map directly onto how a small business uses AI in practice. Invented statistics get pasted into pitch decks, board reports and proposals. Invented quotes end up in marketing copy or LinkedIn posts. Invented citations point to studies, court cases or regulations that turn out not to exist. And invented legal or regulatory provisions appear as confident statements that a rule says X when it does not.

The damage concentrates in five surfaces of your business. Client deliverables, where a fabricated number in a report becomes a professional liability. Public-facing copy on your website or blog, where an invented quote becomes a defamation risk and a credibility problem when a reader checks. Regulatory filings, where an invented citation can void the filing and expose the firm to enforcement. Board reports, where bad data corrupts decisions you will spend months unwinding. And advice to staff, where an AI tool confidently tells a junior colleague the wrong thing about employment law or tax treatment, and they act on it. The Air Canada chatbot case in 2024 established that the firm owns its own AI’s output. The tribunal rejected the airline’s argument that the chatbot was a separate entity. If your customer-facing AI says something wrong about your pricing, your refund policy, or your service terms, you are on the hook for it.

When should you ask hard questions about AI output?

The honest answer is whenever the output will leave your firm or change a decision that matters. Internal scratchpad work where a hallucination would be caught instantly is one risk profile. A draft email to a client about their tax position is a different one. A compliance-response summary is different again. The trigger for serious review is the cost of being wrong about that specific piece of work.

A simple test before any AI-generated output goes out. Could this become a legal or regulatory problem if a specific claim in it is wrong? Could this reach a client, a regulator, an investor, or the public? Could a colleague act on this advice without double-checking? If the answer to any of those is yes, you need a human reading it before it leaves the building, and that human needs enough domain knowledge to catch a subtle misstatement, not just a typo. For everything else, including internal first drafts, summarising long documents you’ll read anyway, brainstorming structure for a piece of work, the risk is lower and lighter checks suffice. Get this distinction written down so your team isn’t guessing.

Three governance ideas sit alongside hallucination risk and are worth knowing as you set your own controls. A proportionate AI risk register is a one-page list of every AI tool the firm uses, what its output touches, and what controls apply. Retrieval-augmented generation is the most useful technical control for high-stakes use cases. An audit trail records which tool was used, what prompt was given, and who reviewed the output.

For a fuller treatment of the register, see a proportionate AI risk register for a 5 to 50 person business. On retrieval-augmented generation, treat vendor claims of “hallucination-free” with the same scepticism you’d apply to any vendor superlative. The Stanford evidence is clear that retrieval helps but does not eliminate the problem. On audit trail, regulated firms have no choice. For everyone else it is the first thing that protects you when something goes wrong, and it costs almost nothing to set up.

The single decision every owner needs to make explicitly is which categories of AI output go out of your firm without a human reading them, and why. A meaningful share of firms have never made this decision in writing. They’ve drifted into a position where the answer is “most of it, by default”, which is the wrong answer. The right answer is small, deliberate, and on paper. If you’d like help getting that on paper for your firm, book a conversation.

Sources

- OpenAI (2024). Why language models hallucinate. Identifies training-time incentives that reward guessing over admitting uncertainty, the systemic root of hallucination behaviour. https://openai.com/index/why-language-models-hallucinate/ - Anthropic (2024). Mapping the mind of a large language model. Explains that LLMs represent concepts statistically rather than retrieving facts, the mechanical basis for fluent-but-false output. https://www.anthropic.com/research/mapping-mind-language-model - Stanford RegLab (2024). Hallucination-free? Assessing the reliability of leading AI legal research tools. Measured 17 to 33 percent hallucination rates in legal-grade retrieval-augmented systems claiming to eliminate them. https://hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-or-more-benchmarking-queries - Damien Charlotin (ongoing). Database of legal cases involving generative AI hallucinations. Catalogue of over 1,400 documented court cases citing fabricated AI content, with sanctions and jurisdictions. https://www.damiencharlotin.com/hallucinations/ - British Columbia Civil Resolution Tribunal (2024). Moffatt v. Air Canada decision. Established that a company is liable for the misleading output of its own customer-facing chatbot, with direct consequences for SME deployers. https://www.mccarthy.ca/en/insights/blogs/techlex/moffatt-v-air-canada-misrepresentation-ai-chatbot - The Markup (2024). NYC's AI chatbot tells businesses to break the law. Documents a government deployment that fabricated regulatory guidance, illustrating the public-information failure mode. https://themarkup.org/artificial-intelligence/2024/03/29/nycs-ai-chatbot-tells-businesses-to-break-the-law - Vectara (ongoing). Hallucination Leaderboard. Benchmark tracking hallucination rates across frontier LLMs from 0.7 to over 13 percent, with task-specific variance. https://github.com/vectara/hallucination-leaderboard - National Center for State Courts (2024). A legal practitioner's guide to AI and hallucinations. Sets the "never trust, always verify" verification standard now adopted across multiple jurisdictions. https://www.ncsc.org/resources-courts/legal-practitioners-guide-ai-hallucinations - Information Commissioner's Office (updated 2025). Guidance on AI and data protection. The UK regulator's expectations for fairness, transparency and accountability in AI use, applicable to SMEs. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/ - International Organization for Standardization (2023). ISO/IEC 42001 AI management system standard. The first international standard for AI governance, applicable to organisations of any size. https://www.iso.org/standard/42001

Frequently asked questions

Will newer AI models stop hallucinating?

No. Vectara's hallucination leaderboard shows rates between 0.7% and 13.6% across frontier models, and OpenAI's own GPT-5 jumps to 47% on knowledge tasks when web access is turned off. Hallucination is a structural property of how these systems generate text, not a defect that scales away. Treat it as a permanent risk class to govern, not a temporary bug to wait out.

Does retrieval-augmented generation solve the hallucination problem?

It reduces hallucination rates meaningfully but does not eliminate them. The Stanford RegLab study of legal AI research tools found that even systems marketed as "hallucination-free" through retrieval-augmented generation still invented citations or misread sources between 17% and 33% of the time. Retrieval is a useful control layer, not a replacement for human review on high-stakes output.

What's the minimum AI governance a small business actually needs?

An inventory of every AI tool in use, a simple risk classification of each tool by what its output touches, a written rule that high-stakes output always gets a human review before it leaves the firm, and a short staff guidance document. That's it for a starting position. The ICO's AI and data protection guidance is a sensible reference, and ISO/IEC 42001 sets a maturity ceiling if you want to grow into one.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation