Why generative AI produces plausible but false answers

A person at a desk reviewing printed documents carefully, pen in hand
TL;DR

Generative AI tools produce false answers because they predict plausible text, not verified facts. A 2025 BBC and European Broadcasting Union study found roughly 45 per cent of AI queries to mainstream chatbots contained errors. UK regulators including the ICO and FCA make clear that firms remain accountable for decisions and communications that rely on AI-generated content, regardless of the tool used. Understanding how hallucinations happen is the first step to managing the risk.

Key takeaways

- Generative AI predicts the next plausible word, not the next correct word, which is why it can sound authoritative while being factually wrong. - A 2025 BBC and European Broadcasting Union study found roughly 45 per cent of AI queries to mainstream chatbots contained errors. - Stanford Human-Centred AI found hallucination rates of 58 to 82 per cent for general chatbots on professional legal queries, dropping to over 17 per cent even for specialist retrieval-augmented generation tools. - The ICO and FCA make clear that UK firms remain accountable for decisions and communications based on AI output, even when a third-party tool produced it. - Effective mitigation combines three things: scoping which tasks AI is appropriate for, using retrieval-augmented generation where possible, and requiring named human sign-off on any output that reaches a client or drives a decision.

A director at a professional services firm asks an AI chatbot to summarise a recent regulatory change for a client briefing. The output is fluent, well-structured, and mentions a named enforcement case that supports the argument. The director sends it without checking. A week later, the client asks for the full case reference. The enforcement case does not exist. The AI invented it, complete with a plausible docket number and a realistic-sounding outcome.

This failure mode is common enough to have its own name: hallucination.

What is a generative AI hallucination?

Generative AI systems such as ChatGPT, Copilot, and Gemini are built to predict the most plausible next word, given everything that came before. They have no mechanism for verifying facts. When pattern-matching produces a statement that sounds right but is not grounded in reality, the result is called a hallucination. MIT’s Generative AI Working Group makes this point directly: these tools generate plausible content, not verified content.

Accuracy, when it occurs, is a coincidental side effect of plausibility. The training data compounds the problem. Generative AI is trained on internet-scale datasets that contain accurate information alongside outdated content, falsehoods, and societal biases. Because models learn correlations rather than ground truth, they can reproduce those inaccuracies with high confidence. Josh Bersin’s analysis of a 2025 BBC and European Broadcasting Union study uses the phrase “poisoned corpus” to describe this: when training data contains flawed or exaggerated information, the model’s outputs reflect those flaws, often persuasively.

A specific form of this is what researchers at the University of Maryland call ghost citations. AI tools can invent academic articles that do not exist, pairing real authors with fabricated titles, journals, and publication dates. The output looks credible. The source is imaginary.

Why does this matter for your firm?

UK service businesses face a staff over-trust problem with AI. The 2025 BBC and European Broadcasting Union study found roughly 45 per cent of answers from mainstream chatbots contained errors. Copilot wrongly claimed a bird flu vaccine trial was under way in Oxford, citing a BBC article from 2006. MIT’s research notes that the apparent objectivity of AI tools makes people less willing to question incorrect outputs.

The regulatory exposure compounds this. The Information Commissioner’s Office makes clear that organisations cannot avoid accountability by attributing an incorrect output to a third-party AI vendor. Under UK GDPR, you remain responsible for the accuracy of data used in decisions about individuals. The ICO’s guidance on the accuracy principle states that organisations must take reasonable steps to ensure accuracy of outputs used in decision-making, regardless of how those outputs were generated.

The Financial Conduct Authority has reinforced this position in its AI commentary. Existing conduct rules, including the requirement to provide information that is clear, fair, and not misleading, apply to AI-assisted communications and advice. Consumer Duty obligations do not pause because a chatbot was involved.

Where will you actually meet it?

Hallucinations surface in the kinds of tasks small firms reach for AI to help with every day: summarising a contract clause, checking what a regulation says, drafting a client briefing, or answering a staff question about policy. Stanford Human-Centred AI tested general-purpose chatbots on legal research queries and found hallucination rates of 58 to 82 per cent, high enough to make unchecked AI output a genuine liability in professional settings.

The pattern extends to more conversational uses. A 2024 peer-reviewed paper in the journal Patterns documented AI systems engaging in sycophancy, generating answers that match what the user appears to want to hear rather than what is accurate. When a team member asks AI to validate a decision already made, the tool is predisposed to agree. The NCSC advises treating generative AI outputs as untrusted until independently verified, particularly in contexts where the cost of getting it wrong is high.

Internal knowledge bases carry the same risk. Stanford’s research found that even specialised tools using retrieval-augmented generation, grounded in a curated set of company documents, still hallucinated on more than 17 per cent of professional queries.

When should you check AI output and when can you trust it?

Risk scales with the consequences of getting it wrong. Using AI to draft an internal memo or a marketing description is low-stakes: a human edits it before it goes anywhere. Using AI to answer a client question about their legal position, their tax exposure, or their regulatory obligations is different. The output becomes advice, and the ICO, FCA, and sector regulators make clear that you remain accountable for it, regardless of which tool generated it.

A practical framework divides uses into two groups. The first group, allowed with review, covers marketing drafts, rough proposals, meeting summaries based on non-sensitive notes, and first-pass internal documentation. The second group, tightly controlled or prohibited, covers anything involving regulated advice, HR decisions, client eligibility assessments, and any output that could be treated as a statement of fact in a dispute or audit.

One rule applies across both groups. Any specific claim in an AI output, including a number, a law reference, a regulation, or a case, should be verified against a primary source before it goes anywhere. If a member of staff cannot find the original source independently, the claim should be treated as unverified, and the text should not be used as written.

Two approaches reduce hallucination risk without abandoning the productivity gains: retrieval-augmented generation (RAG) and human-in-the-loop review. RAG grounds the AI’s answers in documents you supply, rather than letting it draw freely from general training data. A chatbot built on your vetted policies, contracts, and FAQs is less likely to invent content than one drawing from the open internet. The residual risk remains, which is why human review of critical outputs stays necessary.

Human-in-the-loop review means requiring a named person to sign off any AI output before it reaches a client or drives a decision. The ICO recommends meaningful human involvement in automated processes, not rubber-stamping but genuine checking by someone with the expertise to spot an error. For a small firm, the practical version is straightforward: a named reviewer for each content type, with a short checklist covering the facts that matter most.

A short written AI policy, two or three pages, covers the ground you need: which tasks AI is approved for, what review is required, and what is out of scope entirely. The FCA’s AI commentary and ICO accountability guidance both point to documented controls as a baseline expectation when something goes wrong. The NCSC recommends staff training so your team can recognise plausible but false content and knows not to rely on a source it cannot verify. The time to put that in place is before a problem surfaces, not after.

Sources

- MIT Sloan Generative AI Working Group (2024). Addressing AI Hallucinations and Bias. Explains why large language models produce plausible but inaccurate content; cites Stanford HAI findings on hallucination rates in professional legal queries. https://mitsloanedtech.mit.edu/ai/basics/addressing-ai-hallucinations-and-bias/ - University of Maryland Libraries (2024). AI and Academic Research: Citations and Hallucinations. Documents how AI systems invent non-existent academic sources, pairing real authors with fabricated article titles and journals. https://lib.guides.umd.edu/c.php?g=1340355&p=9880574 - Goldstein et al. (2024). Lies, Deception, and AI. Peer-reviewed study in Patterns journal documenting sycophancy, confabulation, and systematic deception behaviours in large language models. https://pmc.ncbi.nlm.nih.gov/articles/PMC11117051/ - Bersin, J. (2025). BBC Finds That 45% of AI Queries Produce Erroneous Answers. Analysis of the BBC and European Broadcasting Union 2025 study on mainstream chatbot error rates, including the Copilot bird-flu fabrication. https://joshbersin.com/2025/10/bbc-finds-that-45-of-ai-queries-produce-erroneous-answers/ - Information Commissioner's Office (2024). AI and Data Protection. UK regulatory guidance on organisational accountability for accuracy in AI outputs and automated decision-making under UK GDPR. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ai-and-data-protection/ - Information Commissioner's Office (2024). Accuracy Principle under UK GDPR. Clarifies that organisations cannot attribute inaccuracies to third-party AI vendors and must take reasonable steps to ensure output accuracy. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/data-protection-principles/accuracy/ - Financial Conduct Authority (2022). DP5/22: Artificial Intelligence and Machine Learning in Financial Services. Sets out FCA expectations that conduct obligations, including clear, fair and not misleading standards, apply to AI-assisted communications and advice. https://www.fca.org.uk/publication/discussion/dp5-22.pdf - European Parliament (2024). EU Artificial Intelligence Act (Regulation 2024/1689). Defines high-risk AI classifications and mandatory human oversight requirements for systems used in employment, credit scoring, and essential services. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689 - National Cyber Security Centre (2023). Generative AI: Secure Use Guidance. Advises organisations to treat AI outputs as untrusted until independently verified and to establish internal policies covering acceptable uses. https://www.ncsc.gov.uk/guidance/generative-ai-secure-use

Frequently asked questions

Does AI get better at avoiding hallucinations over time?

Newer models do produce fewer errors than their 2023-era predecessors, and retrieval-augmented generation systems cut the rate further. Even so, Stanford's research found specialist RAG-based legal tools still hallucinated on more than 17 per cent of professional queries. Improvement is real, but a lower error rate is not the same as reliable enough to skip human review in a professional context.

How do I know when an AI output contains a hallucination?

You often cannot tell from the output itself. Hallucinated content reads with the same confidence and structure as accurate content. The only reliable check is verifying specific factual claims, numbers, case references, and citations against primary sources. If a claim cannot be independently verified, treat it as unconfirmed rather than true.

Does using AI for internal tasks rather than client-facing work reduce the risk?

It reduces the regulatory exposure but the underlying problem persists. An internal decision made on the basis of a fabricated figure or misremembered regulation still has consequences. The practical answer is to calibrate review effort to the stakes of the decision, not to whether the output will be seen externally.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation