A customer emails to say your chatbot told them they could return a product after 60 days. Your policy says 28. The bot did not misread your documentation. There was no relevant documentation for it to read, so it filled the gap with something that sounded plausible. That is a hallucination, and for a service firm that relies on customer trust the consequences run from complaint handling to regulatory scrutiny.
The encouraging part is that hallucinations in customer service are largely a design problem. The way you scope, feed, and govern a chatbot determines how often it invents things. Here is the sequence that works.
What does a hallucination look like in a customer service chat?
A chatbot hallucination is a confident, fluent answer that is factually wrong. In customer service, that means wrong return policies, invented order statuses, or misquoted prices. Salesforce reports hallucination rates of up to 20% in general AI use. UK contact centre provider Gnatta points out that these errors are usually caused by how the overall system is designed, not by the underlying model alone.
In practice, hallucinations in support conversations fall into a few recognisable patterns. The bot invents a procedure that does not exist. It misquotes a policy term or a price. It tells a customer their order has shipped when the fulfilment API returned an error. None of these require a technically unusual model. They require a chatbot that was given too much scope, too little structured data, and no instruction about what to do when it runs out of reliable information.
Why does this matter more for a small service firm?
In a larger business, a chatbot incident lands on a customer experience team. In a firm of ten or twenty people, a hallucination is a customer relationship broken by software you vouched for. The ICO’s generative AI guidance requires organisations to ensure AI outputs are not misleading in ways that could cause harm, and under UK GDPR you are the data controller responsible for the accuracy of personal data outputs.
There is also a consumer-facing dimension. The FCA’s Consumer Duty expects firms to avoid foreseeable harm in customer communications, and that obligation applies to digital channels and automated tools as much as to staff conversations. The CMA has flagged that firms deploying AI tools must not misrepresent the reliability of those tools to consumers. For a regulated business, or one handling financial or health information, both of those bars apply to your chatbot.
Where do the practical controls actually sit?
The most common root cause is breadth. A chatbot told to handle any question about your business using its general knowledge will hallucinate. One told to handle only order tracking, using your order management data, with an explicit hand-off trigger for anything outside that boundary, is far less likely to. The practical controls sit across five layers: scope, escalation, data access, guardrails, and testing.
Start with scope. Define one to three low-risk tasks: order status, opening hours, appointment bookings. In the system prompt, specify the bot’s role and set an explicit refusal condition for anything outside it. Gnatta’s research recommends treating separate tasks as separate agents, so a returns assistant never blends with a general FAQ bot, reducing the chance of a confused or invented answer.
Build human escalation from day one. Gnatta calls this the “no dead ends” principle: the AI always has a valid next step, whether that is answering, escalating, or asking for more detail. Set automatic hand-off triggers for complaints, legal or financial queries, repeated requests for a human, or when an API returns an error rather than a clean result.
Restrict what data the chatbot can access. Each part of your bot should only see the data it needs, applying what security teams call least privilege. A curated FAQ and a structured knowledge base outperform a bot that can browse your entire email archive. Retrieval-augmented generation (RAG) grounds answers in specific documents rather than trained generalisations, and many mid-market chatbot platforms now offer it as a standard feature.
Add guardrails and logging, then test. Configure the platform to record what data was retrieved, what answer was sent, and any action taken. Add simple validation checks before the bot sends sensitive information. Then run a sample of real historic support tickets through the bot before launch, ask staff to confuse it deliberately, and re-test after any knowledge base update.
When should you step back rather than fix?
Fixing hallucinations is achievable for many customer service use cases. For some, the more honest answer is that a chatbot should not be making the call at all. The ICO requires a Data Protection Impact Assessment for AI deployments with high-risk impacts. The FCA Consumer Duty expects firms to avoid foreseeable harm, and that applies to digital and automated channels as much as to staff.
If your bot handles complaints, gives regulated advice, or processes anything that could constitute a binding commitment, the controls above may not be enough. Salesforce itself acknowledges that hallucinations remain an inherent risk in any generative system, and vendors who will not document how their model is grounded and monitored create an assurance gap that regulators may eventually ask you to explain.
An AI-assisted model works well here. Staff use the tool to draft or summarise replies, but a human sends the final message, keeping accuracy in human hands without losing the efficiency. That arrangement also sidesteps the ICO’s concern about AI making consequential decisions about individuals without meaningful oversight.
What concepts should you understand before building?
Three technical ideas come up in every serious conversation about reducing chatbot hallucinations. Retrieval-augmented generation (RAG) grounds answers in specific documents rather than training data, and it is now a standard feature in many mid-market platforms. System prompt scoping limits what the bot will and will not discuss. Escalation design specifies exactly what happens when the bot cannot produce a reliable answer.
A fourth concept matters for UK firms specifically: the Data Protection Impact Assessment. If your chatbot accesses personal customer data, the ICO expects you to document what the risks are and how you are mitigating them. This is not a large-company formality. A one or two page risk log that covers scope, data access, and escalation design satisfies the spirit of the requirement for many small service deployments.
The EU AI Act’s transparency obligation is also worth knowing, even for firms trading only within the UK. Systems that deploy general-purpose AI from EU-regulated providers may carry obligations to disclose to users that they are interacting with AI. That disclosure is good practice regardless of jurisdiction and, combined with a visible option to speak with a human, is the single most effective way to protect customer trust when something does go wrong.
When you are ready to find support with the practical side of AI in your business, Book a conversation.



