Reducing AI hallucinations: a guide for UK firms

A New York lawyer filed a court brief in May 2023 containing six cited cases. The judge asked to see them. None existed. ChatGPT had invented every one, complete with plausible-sounding names, dates, and references. The lawyer was sanctioned, and the story ran in every major news outlet within days.

If you have been following AI in practice, you have probably heard this story. The question it raises for an owner-managed services firm is a practical one. Which outputs need verification before use, and what do the practical controls look like for a business without a dedicated compliance team?

What is an AI hallucination?

An AI hallucination is when a language model generates text that sounds credible but is factually wrong. The model predicts the most likely next word rather than retrieving verified facts. When that process fills a gap that needs a specific figure, name, or legal reference, it produces something plausible but invented. Model confidence tells you nothing about whether the claim is true.

The mechanism is a consequence of how LLMs are built, rather than a bug someone will eventually patch out. The model is trained to produce fluent, contextually appropriate text. When specific knowledge is missing, it fills the space with something that reads as if it belongs. That is why false claims often arrive with the same tone and structure as accurate ones, making them genuinely hard to spot without checking the original source.

Why does it matter for your business?

In an owner-managed services business, a false claim does not stay inside the tool. It travels into a client proposal, a compliance report, an HR letter, or a pricing conversation. The ICO’s AI and data protection guidance is explicit. Organisations remain responsible for accuracy and lawful processing when using AI tools, including third-party systems. If you sent it, you own it, regardless of what generated it.

For businesses in regulated sectors, the consequences are more direct. The FCA has made clear that regulated firms remain accountable for outcomes even when AI supports or automates a decision, and any owner-managed financial services, legal, or healthcare business is in scope the moment AI touches a customer-facing process. For everyone else, the risk sits in contracts and reputation. A false claim in a proposal can create a contractual misrepresentation, and a wrong regulatory figure in a compliance document can reach an auditor before anyone notices the error.

Where will you actually run into it?

The places where hallucinations cause the greatest harm in a services firm are those where specificity matters. Legal references in contracts, financial figures in reports, regulatory requirements in compliance advice, and any claim a client might reasonably rely on are the categories to watch. Internal brainstorming and first-pass drafts carry lower risk because a person will read and rewrite them before they reach anyone outside the business.

The NCSC points out that risks worsen when staff feed stale, uncurated, or sensitive documents into AI tools, or when tools are allowed to browse uncontrolled external sources. A model drawing from your own clean, current documentation will hallucinate less than one drawing from general training data and open web browsing. Source control matters as much as output checking.

The categories that need the closest attention in a typical owner-managed services firm are client-facing proposals with specific claims, any output referencing regulatory requirements, HR decisions or formal letters, financial summaries and forecasts, legal summaries, and any statement made in response to a client complaint or incident.

When should you apply the checks?

Risk varies according to what you are using AI for and who will rely on the output. The UK government’s AI Playbook treats risk assessment as a precondition for any AI deployment, rather than something to retrofit after the fact. The FCA has been equally direct. Regulated firms remain accountable for outcomes even when AI supports or automates a decision. That accountability stays with the business.

A working rule for a 5-to-50 person services firm is to classify every AI-assisted output into one of two buckets before it leaves the building. Low-risk outputs are those that will go through a human editor who will read and rewrite them, where no specific factual claim is being relied on, and where the consequences of an error are limited. Everything else is high-risk. That covers anything client-facing that will be read as factual, anything citing a regulatory requirement, anything affecting money, contracts, or people.

For low-risk outputs, a prompt that tells the model to flag unknowns and distinguish facts from assumptions is a useful starting control. For high-risk outputs, you need a source document to check against before the output leaves the firm.

Five practical controls worth putting in place

The controls are not complicated. They work as a layered system. Bind the model to your own sources, require it to cite claims, add a human reviewer before anything leaves the building, keep a short list of topics that stay off-limits for AI drafting, and log failures so you can tighten the rules over time. Many of these cost an afternoon to set up.

Curate your knowledge base

The single most reliable way to reduce hallucinations is to constrain the model to your own source material rather than its general training data. AllianceBernstein describes this as building a “fence” around the model’s knowledge source, and it is essentially what Retrieval-Augmented Generation (RAG) achieves in practice. In practice, this means maintaining an AI-readable folder of your pricing documents, service descriptions, policies, terms, and approved FAQs, and requiring staff to feed these in rather than asking the model to draw from general knowledge.

Require citations on factual claims

Any output that contains a specific factual claim should show the source document it drew from. If the model cannot cite a source, the output is a working draft, not a finished product. This single step makes fabrication easy to spot and stops invented statistics from travelling into client-facing material unnoticed.

Apply a maker-checker rule for client-facing outputs

One person drafts with AI, a second checks the facts against the source before anything goes out. The ICO’s accountability guidance makes this especially sensible where personal data or customer impact is involved. A ten-minute check on a proposal catches errors before they become problems and costs far less than the alternative.

Maintain a do-not-use list

Some areas should stay outside AI’s scope for producing final output. Legal citations, regulatory interpretations, financial claims, HR decisions, and incident response statements all need source-backed human review. Communicating the list to staff before they encounter these situations is considerably cheaper than repairing the output after.

Log failures and update the rules

Keep a short running note of hallucinations, wrong citations, and near-misses. Update prompts, source documents, and review rules when patterns appear. AllianceBernstein’s operational guidance specifically recommends regular expert feedback loops to improve quality over time. A brief monthly review of flagged errors is enough for many owner-managed firms and will steadily tighten the system without requiring a dedicated resource.

Running these controls well comes down to a clear decision rule about which outputs are high-risk and a short process around them before they leave the building. SDH’s industry analysis suggests more than three-quarters of enterprises now involve human experts to review and approve AI outputs in high-risk tasks. For an owner-managed services firm, the equivalent is simpler. One person, one check, before anything sensitive reaches a client.

Reducing hallucinations and false claims in AI outputs

Key takeaways

What is an AI hallucination?

Why does it matter for your business?

Where will you actually run into it?

When should you apply the checks?