You set up an AI assistant connected to your SharePoint or Google Drive. Staff start asking it questions about HR policies, pricing, and service delivery procedures. The answers arrive in seconds, well-formatted and confident. Within the first week, someone gets wrong information about sick pay, drawn not from your current policy but from a handbook three years out of date. The 2024 update is in the same library. The AI retrieved the older version.
The scenario is predictable rather than exceptional, and it comes from how these tools fundamentally work. Understanding that is the first step to doing something useful about it.
What is actually happening when AI reads your files?
These tools typically work through retrieval-augmented generation (RAG). When a staff member asks a question, the system searches your documents for relevant passages, then asks a language model to draft an answer from what it found. The model predicts plausible text rather than verifying facts. If the retrieved passages are incomplete, outdated, or contradictory, the model builds its answer from that material and fills any gaps rather than admitting it does not know.
Language models are designed to generate fluent, contextually appropriate text. Researchers at the University of Maryland have noted that AI models cannot reliably identify where in their training data an answer comes from, and that the same query can produce different citations on different runs while the answer text stays identical. NIST evaluation work has similarly flagged hallucination and factual inaccuracy as systemic issues in large language model assessments, not edge cases.
A 2025 BBC/EBU study tested ChatGPT, Microsoft Copilot, Gemini, and Perplexity on news questions and found roughly 45% of answers contained errors, including basic factual questions about public figures. Those were public web tools, not internal document assistants. The underlying failure modes are the same: the model can sound authoritative while being wrong, and the RAG architecture does not eliminate this. It redirects which source material the error is drawn from.
Why does this matter for your business?
The risk extends beyond confused staff. Under UK GDPR and the Data Protection Act 2018, your firm is responsible for the accuracy of AI outputs that affect people. The ICO has made this explicit in its generative AI guidance, and the FCA has told regulated firms that if AI gives wrong guidance to a client, responsibility sits with you, not the vendor.
For an owner-managed professional services firm, this plays out in three areas. HR: wrong answers about holiday entitlement, sick pay, or disciplinary procedures can lead to decisions that create employment claims. Client advice: if a staff member uses an AI assistant to answer a client query and the answer is wrong, the firm carries the professional liability. Pricing and contracting: errors in how the AI interprets scope or pricing documents can create commercial disputes before anyone notices.
The pattern of over-trusting AI-generated text is well documented. In 2023, the law firm Levidow, Levidow and Oberman in New York was sanctioned after a lawyer used ChatGPT to draft a court filing and the model invented case law citations that did not exist. The tool was not reading internal documents, but the lesson holds: AI presents fabricated content with the same formatting and apparent confidence as accurate content, and professionals can miss it.
Where do you actually meet this problem?
The failure modes in a small services firm tend to fall into a handful of patterns. An employee handbook from 2021 sits in your SharePoint alongside the 2024 update; the AI retrieves whichever has more matching text, not the more recent one. Pricing documents exist in three versions with no clear label. HR guidance is buried in scanned PDFs and never properly indexed.
If your key guidance is stored inside tables or images within a PDF, it may not be indexed correctly at all, particularly if the document originated as a physical form that was scanned rather than written digitally. The relevant text is invisible to the retrieval system.
Prompt quality compounds the problem. “What is our refund policy?” gives the AI no indication of which service line, which client type, or which jurisdiction applies. Without constraints on which folder or index to draw from, the model picks any vaguely relevant document rather than the authoritative one. Vague questions get approximate answers.
There is also a permissions issue in tools such as Microsoft Copilot. If a manager’s SharePoint library contains the latest pricing model but many staff cannot see it, different staff members asking the same question receive different answers depending on what each person has access to.
When does the risk go up, and when does it go down?
The size of the problem depends on two things: how well curated your documents are, and how consequential the answers are. A version-controlled, clearly labelled document library reduces the risk significantly. So does using AI only for lower-stakes tasks, where a staff member with domain knowledge reviews the output before acting. The risk climbs sharply when answers can reach clients or influence employment or pricing decisions.
There are genuine circumstances where the concern is smaller. A single, well-maintained set of standard operating procedures with no duplicates makes retrieval more reliable and gives the model less room to improvise. If you are using AI to help senior staff draft or summarise documents, and those staff know the domain well enough to spot errors, the failure risk is lower.
Where the risk is highest: any situation where a staff member without deep domain knowledge acts on the AI’s answer without reading the underlying document, and any answer that reaches a third party, whether a client, a regulator, or an employment tribunal.
What should you do about it first?
The fix starts with your content, not your AI tool. Establish one authoritative location for each type of document, archive old versions clearly, and add basic structure: headings, effective dates, scope notes. IBM manages 6,000 HR policies through its internal AI by assigning each one a named accountable owner. That discipline is the reason it works. Document quality is the foundation AI accuracy is built on.
The second layer is system configuration. Any AI assistant built on company files should let you restrict what folders or libraries it draws from for specific question types. Configure policy answers to come only from a designated, version-controlled library, not the entire drive. Where possible, require the tool to cite the specific document it used, not just a text summary. If it cannot do this, it is the wrong tool for this use case.
The third layer is governance. Assign a named owner per content domain, write a short usage policy for staff, and keep a simple log of cases where the AI gave a wrong answer. The NCSC recommends a human-in-the-loop approach for any generative AI system where outputs inform decisions. That means someone with relevant knowledge reviews the answer before it is acted on, particularly in HR, client-facing, and compliance contexts.
If you are processing personal data through AI, a Data Protection Impact Assessment may be required under UK GDPR. The ICO’s guidance is clear: connecting a document library to an AI tool counts as data processing under UK GDPR and carries the same accountability as any other form of processing in your organisation.



