Why AI gives wrong answers from your company files

TL;DR

AI tools that read your company files typically work by searching for relevant passages and asking a language model to draft an answer from what they find. If your files are outdated, contradictory, or poorly structured, the model builds its answer from that material and fills any gaps with plausible guesses. Under UK GDPR and FCA rules, responsibility for those errors sits with your firm, not the AI vendor.

Key takeaways

- AI tools that read your files use retrieval-augmented generation (RAG): the system searches for relevant document passages and asks a language model to draft an answer, but the model predicts plausible text rather than verifying facts. - A 2025 BBC/EBU study found roughly 45% of AI answers to news questions contained errors across four major tools; the same failure modes appear when AI is built on top of company files. - Under UK GDPR and FCA rules, responsibility for wrong AI outputs sits with your firm, not the vendor. - The most common failure modes in owner-managed firms are outdated documents, conflicting versions, poorly structured files, and vague prompts that give the AI no context about jurisdiction or timeframe. - Document hygiene comes before AI configuration: establish one authoritative location per content type, archive old versions clearly, and add effective dates and clear headings before connecting any AI tool.

You set up an AI assistant connected to your SharePoint or Google Drive. Staff start asking it questions about HR policies, pricing, and service delivery procedures. The answers arrive in seconds, well-formatted and confident. Within the first week, someone gets wrong information about sick pay, drawn not from your current policy but from a handbook three years out of date. The 2024 update is in the same library. The AI retrieved the older version.

The scenario is predictable rather than exceptional, and it comes from how these tools fundamentally work. Understanding that is the first step to doing something useful about it.

What is actually happening when AI reads your files?

These tools typically work through retrieval-augmented generation (RAG). When a staff member asks a question, the system searches your documents for relevant passages, then asks a language model to draft an answer from what it found. The model predicts plausible text rather than verifying facts. If the retrieved passages are incomplete, outdated, or contradictory, the model builds its answer from that material and fills any gaps rather than admitting it does not know.

Language models are designed to generate fluent, contextually appropriate text. Researchers at the University of Maryland have noted that AI models cannot reliably identify where in their training data an answer comes from, and that the same query can produce different citations on different runs while the answer text stays identical. NIST evaluation work has similarly flagged hallucination and factual inaccuracy as systemic issues in large language model assessments, not edge cases.

A 2025 BBC/EBU study tested ChatGPT, Microsoft Copilot, Gemini, and Perplexity on news questions and found roughly 45% of answers contained errors, including basic factual questions about public figures. Those were public web tools, not internal document assistants. The underlying failure modes are the same. The model can sound authoritative while being wrong, and the RAG architecture does not eliminate this. It redirects which source material the error is drawn from.

Why does this matter for your business?

The risk extends beyond confused staff. Under UK GDPR and the Data Protection Act 2018, your firm is responsible for the accuracy of AI outputs that affect people. The ICO has made this explicit in its generative AI guidance, and the FCA has told regulated firms that if AI gives wrong guidance to a client, responsibility sits with you, not the vendor.

For an owner-managed professional services firm, this plays out in three areas. In HR, wrong answers about holiday entitlement, sick pay, or disciplinary procedures can lead to decisions that create employment claims. On client advice, if a staff member uses an AI assistant to answer a client query and the answer is wrong, the firm carries the professional liability. On pricing and contracting, errors in how the AI interprets scope or pricing documents can create commercial disputes before anyone notices.

The pattern of over-trusting AI-generated text is well documented. In 2023, the law firm Levidow, Levidow and Oberman in New York was sanctioned after a lawyer used ChatGPT to draft a court filing and the model invented case law citations that did not exist. The tool was not reading internal documents, but the lesson holds. AI presents fabricated content with the same formatting and apparent confidence as accurate content, and professionals can miss it.

Where do you actually meet this problem?

The failure modes in a small services firm tend to fall into a handful of patterns. An employee handbook from 2021 sits in your SharePoint alongside the 2024 update; the AI retrieves whichever has more matching text, not the more recent one. Pricing documents exist in three versions with no clear label. HR guidance is buried in scanned PDFs and never properly indexed.

If your key guidance is stored inside tables or images within a PDF, it may not be indexed correctly at all, particularly if the document originated as a physical form that was scanned rather than written digitally. The relevant text is invisible to the retrieval system.

Prompt quality compounds the problem. “What is our refund policy?” gives the AI no indication of which service line, which client type, or which jurisdiction applies. Without constraints on which folder or index to draw from, the model picks any vaguely relevant document rather than the authoritative one. Vague questions get approximate answers.

There is also a permissions issue in tools such as Microsoft Copilot. If a manager’s SharePoint library contains the latest pricing model but many staff cannot see it, different staff members asking the same question receive different answers depending on what each person has access to.

When does the risk go up, and when does it go down?

The size of the problem depends on how well curated your documents are, and how consequential the answers are. A version-controlled, clearly labelled document library reduces the risk significantly. So does using AI only for lower-stakes tasks, where a staff member with domain knowledge reviews the output before acting. The risk climbs sharply when answers can reach clients or influence employment or pricing decisions.

There are genuine circumstances where the concern is smaller. A single, well-maintained set of standard operating procedures with no duplicates makes retrieval more reliable and gives the model less room to improvise. If you are using AI to help senior staff draft or summarise documents, and those staff know the domain well enough to spot errors, the failure risk is lower.

The risk is highest when a staff member without deep domain knowledge acts on the AI’s answer without reading the underlying document, or when an answer reaches a third party, whether a client, a regulator, or an employment tribunal.

What should you do about it first?

The fix starts with your content, not your AI tool. Establish one authoritative location for each type of document, archive old versions clearly, and add basic structure, including headings, effective dates, and scope notes. IBM manages 6,000 HR policies through its internal AI by assigning each one a named accountable owner. That discipline is the reason it works. Document quality is the foundation AI accuracy is built on.

The second layer is system configuration. Any AI assistant built on company files should let you restrict what folders or libraries it draws from for specific question types. Configure policy answers to come only from a designated, version-controlled library, not the entire drive. Where possible, require the tool to cite the specific document it used, not just a text summary. If it cannot do this, it is the wrong tool for this use case.

The third layer is governance. Assign a named owner per content domain, write a short usage policy for staff, and keep a simple log of cases where the AI gave a wrong answer. The NCSC recommends a human-in-the-loop approach for any generative AI system where outputs inform decisions. That means someone with relevant knowledge reviews the answer before it is acted on, particularly in HR, client-facing, and compliance contexts.

If you are processing personal data through AI, a Data Protection Impact Assessment may be required under UK GDPR. The ICO is clear on this. Connecting a document library to an AI tool counts as data processing under UK GDPR and carries the same accountability as any other form of processing in your organisation.

Sources

- NIST (2023). Evaluating generative AI systems for misinformation. US National Institute of Standards and Technology research examining hallucination and factual inaccuracy as systemic issues in large language model evaluations. https://www.nist.gov/news-events/news/2023/10/nist-evaluates-generative-ai-systems-misinformation - NCSC (2023). Guidelines for secure AI system development. UK National Cyber Security Centre guidance on safe AI deployment, including retrieval constraints, access controls, and human-in-the-loop review. https://www.ncsc.gov.uk/collection/guidelines-for-secure-ai-system-development - NCSC (2023). Using ChatGPT and other generative AI securely. NCSC advice on staff training, human review of outputs, and clear usage policies for generative AI in organisations. https://www.ncsc.gov.uk/guidance/using-chatgpt-and-other-generative-ai-securely - ICO (2023). Guidance on AI and data protection. ICO statement that organisations remain responsible under UK GDPR for the accuracy of personal data processing, including AI-generated outputs about individuals. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/ - FCA (2023). Regulating AI in financial services. FCA position that regulated firms remain responsible for AI outcomes and that systems must be monitorable, explainable, and accountable. https://www.fca.org.uk/news/speeches/regulating-ai-financial-services - Lewis et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Foundational peer-reviewed research establishing the RAG architecture and its core failure modes when retrieval is incomplete or inconsistent. https://arxiv.org/abs/2005.11401 - CMA (2023). Review of AI foundation models. Competition and Markets Authority report highlighting hallucinations and misleading AI outputs as consumer protection concerns requiring governance. https://www.gov.uk/government/publications/cma-review-of-foundation-models-update-paper - University of Maryland Libraries (2024). Evaluating AI research tools. Library guidance documenting LLM citation fabrication, noting that footnotes change between runs for the same query while the answer text stays identical. https://lib.guides.umd.edu/c.php?g=1340355&p=9880574 - New York Times (2023). Lawyers sanctioned after AI-invented case citations in court filing. Report on the Levidow, Levidow and Oberman case where ChatGPT fabricated legal citations used in a court document, resulting in sanctions against the lawyers. https://www.nytimes.com/2023/05/27/nyregion/chatgpt-lawyer-court-filing.html - Bersin, J. (2025). BBC finds that 45% of AI queries produce erroneous answers. Analysis of the BBC/EBU accuracy study testing ChatGPT, Microsoft Copilot, Gemini, and Perplexity across news questions, finding roughly 45% error rate including basic factual questions. https://joshbersin.com/2025/10/bbc-finds-that-45-of-ai-queries-produce-erroneous-answers/

Frequently asked questions

Why does AI give wrong answers even when it has access to my documents?

These tools typically use retrieval-augmented generation: the system searches your documents for relevant passages and asks a language model to draft an answer from what it found. The model predicts plausible text rather than checking facts. If it retrieves an outdated document, a conflicting version, or a poorly structured file, it builds an answer from that material. If the retrieved content does not fully answer the question, the model fills the gap with its best guess.

Is this a UK GDPR problem?

It can be. The ICO has stated that organisations remain responsible under UK GDPR and the Data Protection Act 2018 for the accuracy of AI outputs about individuals. If your AI tool answers questions about staff or clients and gives wrong information, your firm bears the liability. The FCA has made the same point for regulated firms: AI errors do not transfer responsibility to the vendor. A Data Protection Impact Assessment may be required where AI outputs can affect individuals' rights.

What is the single most important thing to fix first?

Start with your documents, not your AI configuration. Establish one authoritative location for each type of content, remove or archive superseded versions, and add basic structure such as clear headings, effective dates, and scope notes. Poor document hygiene is typically the primary driver of AI retrieval errors in owner-managed firms. A clean, well-labelled library does more for AI accuracy than prompt engineering or model-level configuration changes.

Written by Dr Dave Heath, AI consultant and business strategist.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Why AI gives wrong answers from your company files

Key takeaways

What is actually happening when AI reads your files?

Why does this matter for your business?

Where do you actually meet this problem?

When does the risk go up, and when does it go down?

What should you do about it first?

Sources

Frequently asked questions

Why does AI give wrong answers even when it has access to my documents?

Is this a UK GDPR problem?

What is the single most important thing to fix first?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Why AI gives wrong answers from your company files

Key takeaways

What is actually happening when AI reads your files?

Why does this matter for your business?

Where do you actually meet this problem?

When does the risk go up, and when does it go down?

What should you do about it first?

Sources

Frequently asked questions

Why does AI give wrong answers even when it has access to my documents?

Is this a UK GDPR problem?

What is the single most important thing to fix first?

Ready to talk it through?

Related reading

Find the shadow AI in your agency before a client's data leaks through it

A four-tier data map so your team knows what AI can touch

Capture the shop-floor knowledge before it retires

If any of this sounds familiar, let's talk.