Controls and checks that keep chatbot answers accurate

A person at a desk reviewing a chat interface on a laptop screen in a small office
TL;DR

Chatbot accuracy does not happen by default. It comes from narrowing the bot's scope, building a clean knowledge base, configuring low-risk model settings, putting humans in the loop for edge cases, and monitoring outputs over time. UK regulations from the ICO, FCA, and NCSC make these controls a legal obligation as much as a quality one.

Key takeaways

- Chatbots can sound confident while giving wrong answers; accuracy requires deliberate controls, not just a capable LLM. - Restricting the bot to a curated knowledge base using retrieval-augmented generation is the single most effective accuracy control for a small firm. - Setting a low model temperature, defining a "no answer found" fallback, and requiring source references in answers all reduce the risk of incorrect outputs. - Human escalation triggers for complaints, regulated topics, and low-confidence responses protect you legally and practically. - ICO, FCA, and NCSC guidance collectively require UK firms to test, monitor, and maintain accuracy controls for any chatbot that touches customers or personal data.

A professional services firm set up a chatbot to handle routine client questions. The answers sounded authoritative. Several were wrong. One quoted a fee structure that had changed six months earlier. The client acted on it. The firm spent an afternoon managing the fallout. The model had found an old version of the fee schedule and returned it as current.

That gap between fluent and accurate is the central problem with LLM-powered chatbots, and one that business owners tend to underestimate on first deployment.

What does chatbot accuracy actually mean?

Accuracy for a business chatbot means the same thing it means for a member of staff: the answer is factually correct, current, and relevant to the question. A member of staff who is unsure will pause and check. A large language model will produce a fluent, confident answer whether the underlying information is solid or not.

A 2024 study in BMJ Public Health, reported across industry press, found that ChatGPT and similar tools regularly gave inaccurate, incomplete, or potentially unsafe answers to consumer health queries. Health questions are an extreme case, but the failure mode applies broadly: the model optimises for fluency, not truth.

The UK government’s public Ask GOV.UK pilot illustrated how much design choices matter. Before the team tuned prompts and model configuration, accuracy sat at around 76%. After deliberate optimisation, it rose to roughly 90%. That gain came from decisions about what the bot was allowed to do and how it was set up, not from a newer model alone.

Why does accuracy matter for your firm?

A wrong answer from a chatbot is more than an embarrassment. If a customer acts on incorrect information from a channel your firm controls, you carry the legal exposure. UK risk consultancy URM Consulting notes that firms can face misrepresentation and negligence risk where customers rely on inaccurate AI-generated advice, particularly when personal or financial decisions are involved.

The FCA’s Consumer Duty makes this explicit for regulated firms. Communications must be fair, clear, and not misleading. Using an LLM to generate those communications does not shift the responsibility to the model provider. The firm remains accountable for what the chatbot says.

Internal chatbots carry a different but real risk. If a member of staff acts on incorrect HR policy guidance from an internal tool, the error still belongs to the business. The ICO’s guidance on AI and data protection stresses that organisations must test, monitor, and maintain the accuracy of AI-generated outputs, not just at launch but on an ongoing basis.

The practical upside is that accuracy is largely controllable. A chatbot designed with the right constraints performs reliably within its scope. The problems arise when scope is left undefined, when source documents are out of date, or when no one is checking what the bot actually says.

Where do accuracy problems actually come from?

Accuracy failures in business chatbots typically trace back to one of three sources: the model is hallucinating because no relevant document exists in the knowledge base, the model is drawing on a document that is out of date or conflicts with a newer one, or the bot’s scope is too broad and it attempts to answer questions it was never equipped to handle.

The UK government’s DSIT department addressed all three when building its internal Ask Ops chatbot. The team scraped vetted intranet documents into a vector database and instructed the model to query only that corpus. They also set the model temperature to 0.1, making answers as deterministic as possible rather than generatively varied. The model is instructed to return “No answer found” rather than attempt a response when no relevant document is retrieved.

That “no answer found” policy matters more than it sounds. A chatbot that admits it cannot help is far less dangerous than one that fills the gap with a confident approximation. The NCSC, in its guidance on using AI safely, explicitly warns that LLMs “can confidently state incorrect information as fact” and recommends that organisations constrain models to vetted sources and make it easy for users to flag errors.

Out-of-date documents create a quieter failure mode. If your knowledge base contains a 2022 pricing page alongside a 2024 pricing page, the model may retrieve the older version and present it as current. DSIT tackled this by adding conflict-resolution rules to its prompts: the model is told how to handle superseded or contradictory guidance.

What controls should you put in place?

The controls that work are design decisions about scope, source quality, model configuration, and human review. In sequence: define what the bot is allowed to answer, build a small clean knowledge base, configure the model conservatively, specify when a human takes over, and monitor outputs over time. Each step reduces a distinct failure mode.

Start with scope. Define the topics the bot is authorised to answer and set a standard fallback message for anything outside that list. Follow the DSIT pattern: give the model a clear role and restrict it to the provided documents. Anything outside that scope routes to a person.

Knowledge base quality is the second lever. Start with a small, verified set: current service descriptions, pricing, FAQs, key policies. Remove outdated versions before ingestion. Version your documents so the model prefers the most recent. The DSIT experience reinforces a broader principle: how you structure and ingest source material into the knowledge base matters more to accuracy than the choice of underlying LLM.

Human escalation paths matter as much as any technical control. Identify the categories that warrant a handoff: complaints, regulatory language, expressions of dissatisfaction, requests for refunds. When these appear, the bot routes to a person and the conversation is flagged for review. Click4Assistance, which builds chatbots for UK contact centres, emphasises that pre-approved scripted answers and full interaction logging are compliance controls, not just quality improvements.

On monitoring, run a weekly test of ten to fifteen standard questions and score the answers for accuracy. Store interaction logs. Add a simple thumbs-up or thumbs-down rating. When accuracy drops after a model or document update, you want to find out before a customer does.

What do UK regulations require on accuracy?

Three UK regulatory frameworks directly affect chatbot accuracy. The ICO’s UK GDPR accuracy principle requires that personal data used or generated by AI systems is accurate and correctable. The FCA’s Consumer Duty requires fair, clear, and not misleading customer communications regardless of what technology delivers them. The NCSC advises treating all LLM outputs as unverified and applying human oversight to anything consequential.

For financial services firms, the FCA has been direct: AI does not reduce your obligations under Consumer Duty. You must have systems to test and monitor output accuracy, correct issues when they arise, and record how you do so. In practice, that means a named person responsible for the chatbot’s content, not just the technology running it.

The EU AI Act adds a layer for UK firms with customers in the European Union. For limited-risk chatbots, the Act requires that users are told they are interacting with AI. For high-risk use cases such as credit decisions or employment screening, requirements for accuracy documentation, human oversight, and output traceability are considerably more demanding.

The CMA has also signalled a direction of travel through its AI foundation models programme. SMEs deploying consumer-facing chatbots built on large foundation models are expected to be transparent and avoid misleading outputs. That expectation will only become more formalised over time.

For the typical owner-managed UK firm, the practical starting point is NCSC and ICO guidance. Treat model outputs as drafts to be validated. Document your controls. Review them when the underlying model, the document set, or the scope of the bot changes.

Sources

- DSIT (2025). Ask Ops Chatbot algorithmic transparency record. UK government transparency record documenting RAG design, low-temperature settings, and "no answer found" policy in a production internal chatbot. https://www.gov.uk/algorithmic-transparency-records/dsit-ask-ops-chatbot - The Register (2026). GOV.UK chatbot gets smarter but slower as LLMs improve. Reports accuracy improvement from 76% to 90% in the public Ask GOV.UK pilot after model and prompt tuning. https://www.theregister.com/on-prem/2026/03/19/govuk-chatbot-gets-smarter-but-slower-as-llms-improve/5229770 - ICO (2024). Guidance on AI and data protection. Sets out the accuracy principle under UK GDPR and organisations' obligations to test and monitor AI outputs, including a requirement for human review in high-risk use cases. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/ - NCSC (2023). Using AI safely and securely: security considerations of large language models. Advises organisations that LLMs can confidently produce incorrect information and recommends human review, constrained vetted sources, and user feedback mechanisms. https://www.ncsc.gov.uk/collection/large-language-models - FCA (2023). Consumer Duty: guidance for firms. Requires firms using AI in customer communications to ensure outputs are fair, clear, and not misleading and to maintain effective systems and controls. https://www.fca.org.uk/firms/consumer-duty - EU AI Act (2024). Consolidated text: Regulation on Artificial Intelligence. Requires human oversight, logging, and transparency for high-risk AI systems; mandates disclosure that users are interacting with AI even for limited-risk chatbots affecting EU users. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206 - CMA (2023). AI foundation models: initial report. Signals regulatory direction of travel: SMEs deploying consumer-facing chatbots on foundation models are expected to be transparent and avoid misleading outputs. https://www.gov.uk/government/publications/ai-foundation-models-initial-report - URM Consulting (2024). Chatbots and personal data: benefits and risks. Legal and risk commentary on misrepresentation exposure, UK GDPR obligations, and the need for disclaimers and access controls when deploying chatbots in client-facing contexts. https://www.urmconsulting.com/blog/chatbots-and-personal-data-benefits-and-risks - Independent Living (2024). AI chatbots 'confident but wrong'. Reports on a BMJ Public Health study finding ChatGPT gave inaccurate, incomplete, or potentially unsafe answers to consumer health queries, illustrating the fluency-accuracy gap. https://www.independentliving.co.uk/industry-news/ai-chatbots-confident-but-wrong/ - Click4Assistance (2024). How AI chatbot software reduces compliance risks for UK contact centres. Details compliance controls including pre-approved answer scripts, interaction logging, and real-time monitoring for non-compliant content in customer-facing chatbots. https://www.click4assistance.co.uk/how-ai-chatbot-software-reduces-compliance-risks-for-uk-contact-centres

Frequently asked questions

How do I stop my chatbot making up answers?

The most reliable approach is retrieval-augmented generation (RAG), where the bot only answers from a defined set of vetted documents rather than relying on its general training. Pair this with a low temperature setting (0 to 0.3), an explicit "no answer found" fallback for unrecognised queries, and a weekly test of standard questions to catch drift. This combination is what the UK government's DSIT team uses for its own internal chatbot.

Do UK regulations require chatbot accuracy controls?

Yes, several do. The ICO's UK GDPR accuracy principle requires that personal data used or generated by AI systems is kept accurate and correctable. The FCA's Consumer Duty requires chatbots in financial services to give fair, clear, and not misleading information. The NCSC advises organisations to treat LLM outputs as unverified and to apply human oversight for anything consequential. Together, these make accuracy controls a compliance obligation as much as a quality preference.

What is a realistic accuracy level for a small business chatbot?

The UK government's public Ask GOV.UK pilot achieved around 90% accuracy after tuning, up from 76% before optimisation. For an owner-managed firm using a well-scoped chatbot with a clean knowledge base, 85 to 90% on standard questions is achievable with the right controls in place. Accuracy will vary depending on how well-structured the source documents are and whether the bot's scope is tightly defined.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation