What is an AI hallucination? Why it matters for your business

A person at a desk reading a laptop screen with a slight frown, phone face-up beside the keyboard
TL;DR

An AI hallucination is a fluent, confident output from a language model that is factually wrong. It happens because the model predicts plausible text rather than retrieves verified facts, and no model release fixes it. The practical task for a business owner is to classify each AI use by how much a wrong answer would cost, then put proportionate checks around the cases where the cost is high.

Key takeaways

- A hallucination is a confidently-worded false output. The fluency is the trap, the model has no internal "I do not know" signal. - It is structural, not a bug. Language models predict the next likely word, they do not retrieve facts. RLHF training can make confident wrong answers more common, not fewer. - Three jurisdictions are now expecting verification: Air Canada was held liable for a chatbot's invented policy, Mata v Avianca sanctioned lawyers for fake citations, and the UK tribunal in Harber v HMRC flagged ChatGPT-fabricated case law as plausible but invented. - Even the best tools still hallucinate. Stanford's audit of LexisNexis and Thomson Reuters found their RAG-grounded legal tools hallucinated more than 17% of the time, well above what was advertised. - The decision an owner has to make this quarter is which AI uses can tolerate the occasional wrong answer and which cannot, then match the controls to the cost.

A small-firm owner I spoke with last month had a customer call her in the afternoon to say the chatbot on her website had told them she offered a service she does not offer. The bot had answered fluently, with a price, with a turnaround time, with a small caveat about VAT. The customer had taken a screenshot. She now had to decide whether her firm meant what the bot said.

That moment is where the word “hallucination” stops being an industry term and becomes a question an owner has to answer this quarter. Not in theory. Not after the next model release. Now.

What is an AI hallucination?

An AI hallucination is a fluent, confident output from a language model that is simply not true. The model invents a fact, a citation, a policy, or a product detail and presents it with the same calm authority it uses for things it knows. The trap is the fluency. There is no warning tone, no caveat, no internal “I am unsure” flag.

Researchers split hallucinations into two flavours. Factuality hallucinations contradict the world, like inventing a court case or stating a wrong date. Faithfulness hallucinations contradict a source the model was given, like adding a clause to a policy document it was asked to summarise. The distinction matters because the controls for each are different, but the harm is the same: a confident wrong answer that someone acted on.

Why does it happen?

Language models hallucinate because they predict text, they do not look it up. When you ask a model a question, it generates an answer one word at a time, choosing the most likely next word given what came before. It does not ask whether the claim is true. There is no separate truth-check. The machine is optimised to produce plausible continuations, and a plausible-sounding lie is, by definition, plausible.

OpenAI’s own technical write-up on this is unusually clear. Pretraining sees only positive examples of fluent language, so the model learns statistical patterns rather than facts. Spelling and grammar follow consistent patterns and the model learns them well. Arbitrary low-frequency facts, like a specific person’s date of birth or a specific case citation, cannot be predicted from patterns. The model fills the gap with a guess that sounds like the kind of thing that would be true.

There is a second factor that matters for owners evaluating vendor claims. Reinforcement learning from human feedback, the training step where humans rate model answers, tends to reward confident, agreeable output. A model that hedges gets marked down and a model that asserts gets marked up. No model release in 2026 has fully solved this. The best summarisation models on the Vectara leaderboard still hallucinate roughly one answer in fifty.

Where this bites a small business

Four places, in rough order of pain. Customer-facing claims first: a chatbot or AI-drafted email that states a price, a policy, or a warranty term the firm did not authorise. The customer takes a screenshot. The firm has to decide whether to honour the bot or argue with the customer. Regulated advice second: financial, tax, or legal answers where a wrong reply creates duty-of-care exposure under the FCA’s Consumer Duty.

Contract and document drafting third. AI-drafted contracts that cite cases that do not exist. AI-drafted compliance documents that invent regulatory requirements. AI-summarised policies that quietly add a clause the source does not contain. The harm here is that the error survives the draft because the reviewer trusts the fluency. Internal policy and decision support fourth: AI-drafted staff handbooks, AI-summarised board papers, AI-prepared briefings. The audience is internal, but a wrong figure in a board pack is a wrong figure.

The unifying feature across all four is that the model’s confidence is unrelated to whether the model is right. A wrong answer reads exactly like a right one. That is the design of the technology, not a defect in any particular product.

What the cases tell you

Three cases show courts in three jurisdictions starting to expect verification. In Moffatt v Air Canada, a Canadian small-claims tribunal held the airline liable for negligent misrepresentation when its chatbot stated a bereavement-fare policy that did not exist. The tribunal rejected the argument that the chatbot was a separate legal entity. It was part of the firm’s website, the customer relied on it, and the firm was responsible.

In Mata v Avianca, a US federal court sanctioned two attorneys jointly after they submitted a brief containing ChatGPT-invented case citations. In Harber v HMRC, the UK First-tier Tribunal flagged that an appellant had submitted ChatGPT-fabricated case law and noted the citations were “plausible but incorrect”. The Solicitors Regulation Authority’s joint guidance with the NCSC puts the principle bluntly: LLM output “sounds right rather than is right”, and a solicitor who relies on it without verifying is in breach of professional duty.

The takeaway is not that any one of these binds a UK SME directly. It is that the direction of travel is consistent. Courts and regulators in three jurisdictions are now treating AI-generated content as the firm’s own statement, with the firm responsible for its accuracy. This is awareness, not legal advice. Speak to your solicitor and your professional indemnity broker about your specific facts and your specific cover.

What to do about it

Classify your AI uses by what a hallucination would cost. Brainstorming, internal first drafts, and rough summaries can tolerate the occasional invented answer, because a human will read and rewrite anyway. Customer-facing claims, regulated advice, contract drafting, and anything cited externally cannot. Match the controls to the cost. The highest-value control is human review before output leaves the firm in those high-cost categories.

The technical mitigations are useful but partial. Retrieval-augmented generation anchors the model in your actual documents and reduces hallucination, though Stanford’s audit showed even sophisticated RAG-grounded legal tools still hallucinated more than 17% of the time. Prompt design can ask the model to flag uncertainty or cite sources, which makes errors easier to spot. Structured output, where the model returns a fixed schema rather than free text, removes degrees of freedom and shrinks the surface area for invention. Each helps. None solve the underlying issue, which is that the model is guessing.

Treat any vendor claim of “hallucination-free” output with scepticism. Ask for their measured hallucination rate on your specific use case, the methodology, the SLA, and what recourse you have when the system fails. If the vendor cannot answer, the phrase in their pitch deck is doing work the product cannot do. The decision worth making this quarter is not whether to use AI. It is which uses you trust to a model alone, which you put a human in front of, and which you keep out of AI’s hands until the controls catch up. If you’d like a second pair of eyes on that classification for your own firm, book a conversation.

Sources

OpenAI (2024). Why language models hallucinate. Plain-English account of the prediction-not-retrieval mechanism. https://openai.com/index/why-language-models-hallucinate/ Information Commissioner's Office (2024). Guidance on AI and data protection: accuracy and statistical accuracy. UK GDPR's accuracy principle applies to AI-generated outputs about people. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/what-do-we-need-to-know-about-accuracy-and-statistical-accuracy/ Solicitors Regulation Authority and NCSC (2024). Legal practitioners' guide to AI hallucinations. The line that LLM output "sounds right rather than is right". https://www.ncsc.org/resources-courts/legal-practitioners-guide-ai-hallucinations Stanford Law (2024). Hallucination-free? Assessing the reliability of leading AI legal research tools. Found LexisNexis and Thomson Reuters hallucinated more than 17% of the time. https://law.stanford.edu/publications/hallucination-free-assessing-the-reliability-of-leading-ai-legal-research-tools/ Sackers (2023). Harber v HMRC, First-tier Tribunal, 4 December 2023. UK tribunal flagged ChatGPT-fabricated case citations as plausible but invented. https://www.sackers.com/pension/harber-v-hmrc-first-tier-tribunal-4-december-2023/ UBC Allard School of Law (2024). Moffatt v Air Canada and chatbot liability. Canadian small-claims tribunal held the airline liable for a chatbot's invented bereavement-fare policy. https://www.litigate.com/whose-responsibility-is-it-anyway-chatbots-and-legal-issues-in-moffatt-v-air-canada/pdf Justia (2023). Mata v Avianca, US federal court. Two attorneys sanctioned after submitting a brief with ChatGPT-invented citations. https://law.justia.com/cases/federal/district-courts/new-york/nysdce/1:2022cv01461/575368/54/ Vectara (2026). Hallucination leaderboard for summarisation tasks. Best frontier models hallucinate roughly 1 in 50 answers, around 1.8%. https://github.com/vectara/hallucination-leaderboard Financial Conduct Authority (2024). Consumer Duty overview. The regulatory backdrop for any firm whose AI tool speaks to retail customers. https://www.fca.org.uk/firms/consumer-duty/about

Frequently asked questions

Will the next model release fix hallucinations?

No. Hallucination is a structural feature of how language models work, not a bug that gets patched. Models predict plausible text from patterns, they do not look facts up. Frontier models hallucinate less than older ones on some benchmarks, but the best summarisation model on the Vectara leaderboard still gets roughly one in fifty answers wrong. Plan around the rate, do not wait for it to reach zero.

Can I rely on a vendor that says their tool is "hallucination-free"?

Treat the claim with scepticism. Stanford Law audited LexisNexis and Thomson Reuters, both marketed as hallucination-free for legal research, and found both hallucinated more than 17% of the time on legal citation tasks. If a vendor uses the phrase, ask for their measured rate on your use case, the methodology, and what recourse you have when it fails. A vendor who cannot answer is selling marketing.

Where is the legal liability if my chatbot gets something wrong?

With your firm. The Canadian tribunal in Moffatt v Air Canada held the airline liable for a chatbot's invented bereavement-fare policy and rejected the argument that the bot was a separate entity. The principle is being reflected in UK and US courts. This is awareness, not legal advice. Speak to your solicitor and your professional indemnity broker about your specific facts.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation