How AI handles stale documents, contradictions, and version drift

A person at a desk with folders of documents open beside a laptop
TL;DR

When an AI system reads your internal document store, it may retrieve a genuine but superseded policy rather than the current version. This is version drift, distinct from hallucination. The fix is document governance: one authoritative home per document, clear status labels, and old files removed from the searchable index before AI is layered on top. UK GDPR's accuracy standard makes this a compliance matter, not just a productivity risk.

Key takeaways

- Version drift is when AI retrieves a genuine but superseded document; it is distinct from hallucination and needs different controls. - UK GDPR's accuracy principle means stale documents in an AI system are a compliance risk where personal data is involved, not just a productivity issue. - Tools like Microsoft 365 Copilot and Google Workspace Gemini read directly from your file stores, so document hygiene is a direct control on AI output quality. - Version drift does not apply when AI is used only for generic drafting with no internal document retrieval; the risk is live the moment you connect AI to your own files. - The cheapest fix is document governance: one location per policy or template, clear status labels, and obsolete files removed from the searchable index before any AI layer is added.

A firm sets up Microsoft 365 Copilot and, over the following weeks, staff begin using it for routine questions. The first sign that something is wrong comes from a client: the pricing quoted in an email does not match the current schedule. Copilot had pulled its answer from a proposal sitting in SharePoint, one that predated a fee increase made six months earlier. The file was genuine. The version was not.

This is not a hallucination. The document existed and the words were real. The problem has a different name, a different cause, and a different fix.

What actually happens when AI reads an old document?

Version drift is when an AI system retrieves a genuine document that is no longer the current, authoritative version. In a retrieval-augmented generation system, the AI searches your file store to answer a question, then generates a response from what it finds. The system has no built-in sense of which version takes precedence unless you build that rule into the retrieval layer.

This is different from hallucination, where the model invents content with no source. Version drift produces an answer grounded in a real document; the document is simply superseded. A related failure is contradiction: if your file store holds two versions of the same policy or process template, the AI may blend them or choose one based on retrieval ranking. Ask the same question twice in slightly different words and you may get different answers, each referencing a real file.

Prompt engineering does not solve this. A more precise question changes how the model constructs its answer but does not force the retrieval system to prefer the newer file when both are indexed. Post-generation fact-checking has similar limits: a checker can confirm that the cited document exists, not that it is the current version. A superseded but genuine policy can pass a basic accuracy check while producing an operationally wrong answer.

Why does version drift matter for your business?

Small service firms typically hold years of documents in shared drives, email threads, and copied folders, with old proposals, superseded contracts and previous process templates sitting alongside current ones. An AI system ingesting that environment has no automatic preference for the newer file. Poor document hygiene directly determines how often the AI retrieves the wrong version and returns it as confident, authoritative fact.

UK GDPR requires that personal data be accurate and, where necessary, kept up to date. The Information Commissioner’s Office makes clear this is a legal obligation, not a discretionary hygiene standard. When an AI system is summarising client records, case notes or staff files, version drift converts a document governance failure into a data protection failure. That applies to any service firm handling personal data, not only those in regulated sectors.

For regulated firms, the exposure goes further. The FCA’s research on AI in UK financial services identifies data quality and governance as the core controls for firms deploying AI in advice, compliance or client communications. A firm whose AI assistant draws on superseded compliance templates when handling a complaint or a know-your-customer query has a governance failure, not just a bad output.

The Competition and Markets Authority has also noted that AI-enabled tools must not mislead consumers. If a client-facing system draws on stale internal documents to generate pricing or service scope information, the consequences extend beyond a remedial email.

Where will you actually meet this in your business?

Version drift surfaces wherever AI reads from your internal document store rather than a clean, controlled source. Tools like Microsoft 365 Copilot and Google Workspace Gemini sit directly on top of your files, emails and chats, so document discipline matters as much as model quality. Staff questions about current policies, pricing or processes are where the exposure is sharpest.

The highest-risk situations share three features. First, your firm holds the same document in multiple locations: one version in SharePoint, a copy forwarded by email, a third saved to a local or shared drive. Second, files are never formally archived or labelled as superseded, so every version looks active to the retrieval system. Third, staff are using AI to answer questions they would previously have asked a senior colleague, who would have known intuitively which version applied.

The problem scales with the sensitivity of the use case. A team using AI to draft first-pass emails or summarise meeting notes faces lower exposure than one using it to answer questions about client obligations, complaint procedures or pricing. The NCSC’s guidance on safe AI adoption frames the management of document inputs as part of the broader trust and security problem around AI deployment, not simply a productivity concern.

When does this apply, and when is it not your problem?

If your team uses AI only for drafting and brainstorming without a connected document store, version drift does not apply. The risk is live the moment you connect AI to a knowledge base, SharePoint, Google Drive or a similar system. It scales with how much staff rely on the answers for operational decisions, client communications or compliance work.

Three situations keep the risk low: your AI use is confined to generic text generation with no internal document retrieval; your firm already maintains one controlled, versioned document store with clear archive rules; or AI outputs are never used for client commitments, compliance decisions or operational procedures. Each of these reduces the version problem to a background consideration rather than an active failure mode.

The EU AI Act, adopted in 2024, treats data quality and governance as required controls in higher-risk AI deployments. Even before any direct regulatory requirement applies to a small UK firm, the underlying logic is the same: what you give the system determines what comes out. A firm that has already built document lifecycle practices, with status fields, review dates and archive processes, may find version drift is a residual rather than active risk. The gap, for small service firms, is usually that these processes exist informally in someone’s head rather than in the file system itself.

What can you do about it from Monday?

The cheapest intervention for any owner-operated firm is to fix the document layer before adding any AI on top. Give each policy or template one home with a visible status label such as approved, draft or superseded, and move obsolete files out of the searchable index. Industry estimates put up to 80 per cent of AI project time in data preparation rather than model work, and that ratio holds here.

Version metadata is the next layer. Modern document stores allow you to tag files with creation dates, review dates and document status. When an AI system retrieves from that store, it can be configured to prefer current-approved files and exclude superseded ones from retrieval. This is not a default setting in all workplace AI tools, so it is worth asking your vendor or IT provider whether it is active.

Testing for contradiction costs nothing. Ask the AI the same policy question in three different ways and compare the answers. Inconsistent responses are the first visible sign that the knowledge base holds competing versions of the same document.

For client-facing or regulated outputs, keep humans in the review loop until your document governance is confirmed clean. The NCSC frames the management of data inputs as an organisational responsibility, not a technical one. The fix for version drift sits with whoever owns the document lifecycle in your firm, not with whoever chose the model.

Sources

- ICO (2024). UK GDPR guidance: accuracy. Sets out the legal requirement that personal data must be accurate and, where necessary, kept up to date, directly relevant when AI systems summarise client or staff records. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/accuracy/ - ICO (2024). AI and data protection guidance. Covers accuracy, fairness, transparency and human oversight when using AI with personal data, the regulatory framework for AI systems reading internal documents. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/ai-and-data-protection/ - FCA (2024). AI and machine learning in UK financial services. Identifies data quality, governance and explainability as the core controls for regulated firms deploying AI in advice, compliance and client communications. https://www.fca.org.uk/publications/research/ai-machine-learning-uk-financial-services - NCSC (2024-2025). Guidance on the secure adoption of AI. Frames the management of AI data inputs, including stale and conflicting documents, as part of the broader trust and security problem around AI deployment. https://www.ncsc.gov.uk/guidance/artificial-intelligence-security - CMA (2023-2025). Foundation models: AI in competition and consumer markets. Notes that AI-enabled tools must not mislead consumers, relevant where AI-generated client-facing content draws on stale internal documents. https://www.gov.uk/government/publications/ai-foundations-models-in-aid-of-competition-consumer-and-market-outcomes - European Parliament (2024). EU AI Act (Regulation 2024/1689). Treats data quality and governance as required controls in higher-risk AI deployments, reinforcing the principle that input quality determines output reliability. https://eur-lex.europa.eu/eli/reg/2024/1689/oj - Microsoft (2025). Microsoft 365 Copilot product documentation. Describes Copilot as working across an organisation's files, emails and chats, illustrating why version control in SharePoint and OneDrive is a direct AI control. https://www.microsoft.com/en-gb/microsoft-365/copilot - IBM (2024). What is data cleansing? Cites the industry estimate that up to 80 per cent of time in AI projects is spent on data preparation rather than model work, supporting the case for fixing the document layer first. https://www.ibm.com/topics/data-cleansing

Frequently asked questions

What is the difference between AI version drift and a hallucination?

Hallucination is when an AI model invents content with no source at all. Version drift is different: the system retrieves a genuine, real document, but that document is no longer the current authoritative version. The output looks credible and cites something that exists. The problem is that the policy or process was updated and the old file stayed searchable. Both produce wrong answers but they need different fixes.

Does version drift apply if I am only using ChatGPT or Claude for drafting?

If you are using a general-purpose AI tool only for drafting, brainstorming or summarising text you paste in manually, version drift does not apply. The risk arises when AI is connected to your internal document store, shared drive, knowledge base or email archive and set to retrieve and answer from your own files. That is the configuration where document hygiene becomes a direct control on AI accuracy.

What is the quickest fix for version drift in a small firm?

Start with one clear rule: each policy or template lives in one place, carries a visible status label such as approved or superseded, and old versions are moved out of the searchable index rather than left in place. Version drift is almost always a document governance problem rather than an AI problem. A firm that controls what the retrieval layer can see reduces the error rate without touching the model.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation