What is model drift? Why it matters for your business

A person at a desk comparing two printed pages of text next to an open laptop in a small office
TL;DR

Model drift is the slow degradation of an AI system's accuracy or behaviour over time. The classical version is a model going stale on new data. The 2026 version, which almost no SME monitors, is foundation-model version drift, where your vendor releases a new version under your prompt and your tool starts behaving differently without anyone changing the prompt.

Key takeaways

- Model drift comes in three classical flavours, concept drift (the world changed), data drift (the inputs look different), and performance decay (accuracy just dropped). - The 2026 flavour worth tracking is foundation-model version drift, where a vendor upgrade behind your prompt changes how your AI tool behaves. - Drift is not hallucination. RAG and grounding fix hallucination. Monitoring, version-pinning, and retraining fix drift. - The fix is fine-tuned drift versus foundation-version drift. If you trained the model, retrain it. If your vendor swapped it under you, pin the version, test before upgrade, and revisit your prompt. - You do not need an MLOps platform. A monthly 30-minute spot-check, a small golden-dataset test, and a pinned model version cover the typical owner-led firm.

A small marketing agency owner showed me a Claude-based proposal-writing tool she had built in late 2024. It had worked beautifully for a year. By spring 2026 the proposals were coming out longer, blander, and oddly formal. She was convinced one of the team must have edited the prompt. Nobody had. The model under the hood had been upgraded twice while she was busy running the firm.

That is the version of model drift almost nobody is monitoring. The textbook version, a credit-scoring model going stale on post-pandemic data, is real and well documented. The version that catches owner-led businesses in 2026 is quieter, and few firms have a process for spotting it.

What is model drift?

Model drift is the slow, often invisible degradation of an AI system’s accuracy or behaviour over time. It has three classical flavours plus a fourth that is specific to the way owner-led firms buy AI in 2026. Concept drift is when the world changes and the model’s learned patterns no longer hold. Data drift is when the inputs look different. Performance decay is when accuracy simply drops.

A short non-financial example. A restaurant reservation system learned from 2019 to 2022 data when fine dining was booked solid. In 2023 the restaurant opens a patio. The system still accepts reservations, but it predicts capacity using rules that no longer fit. That is concept drift. In 2024 the booking app goes international and starts taking requests in different time zones with different party sizes, which is data drift. By 2025 customers turn up to find no table, which is performance decay.

The fourth flavour is the one almost nobody is watching. Foundation-model version drift is when your AI vendor releases a new version of the model your tool sits on top of. Your prompt has not changed. Your data has not changed. The model has, and your tool now behaves differently. Anthropic moved from Claude 3 to 3.5 to 4 to 4.7 across 2024 to 2026. OpenAI moved from GPT-4 to 4o to GPT-5. If you built a tool in 2024 and have not pinned the version, it is running on something different from what you tested.

Why does it matter for your business?

Drift matters because the failure mode is silent. The tool still answers, the proposals still come out, nobody gets an error message. What changes is the quality, the tone, or the accuracy of the decision underneath, and you only notice when something goes wrong further downstream. For a regulated firm that is a compliance gap. For a services firm it is client complaints, or a quiet drop in conversion two quarters later.

The financial precedents are easy to find. Zillow wrote off around US$304 million in inventory in late 2021 when its iBuyer pricing model failed to keep up with post-pandemic property markets, and cut a quarter of its workforce. Ofqual’s A-level algorithm collapsed in August 2020 when it could not handle a cohort that did not match historical patterns, and was withdrawn within days. Both are extreme cases, and neither is a 10-person agency. The lesson is the same. Models that go unmonitored eventually meet a reality they were not built for.

The regulatory picture is worth holding gently. The PRA’s Supervisory Statement SS1/23 sets binding model risk expectations for banks, including ongoing performance monitoring. The ICO’s accuracy principle under UK GDPR Article 5(1)(d) applies to anyone using personal data in automated decisions, and the ICO has flagged drift specifically as a concern. The EU AI Act’s accuracy and quality-management requirements, applicable from 2 February 2026 for high-risk systems, reach UK firms that sell into the EU. For a 10-person services firm with no regulated decisions, none of these are binding duty, but they have become the audit baseline that auditors and insurers now ask about.

Where will you actually meet it?

Owner-led businesses meet drift in three places, and only one of them looks like the textbook. The first is a vendor deprecation notice. OpenAI and Anthropic both publish model sunset schedules. When a model your tool depends on is retired, you are forced onto a newer version, and the tool’s behaviour can shift. The migration is the new normal of the API economy, and your monitoring needs to expect it.

The second is a silent SaaS upgrade. The customer service platform you bought in 2023 swaps its underlying model on a Tuesday morning. Nobody told you. Your team starts noticing that the AI agent’s escalation logic feels different, or that summaries are formatted in a new way. The vendor counts this as a routine upgrade. From your operational seat it is a behaviour change you did not authorise, and the symptoms look exactly like drift.

The third is the gut-feel staff complaint. “The system feels different this month.” It is the easiest signal to dismiss and often the most useful one. The team using the tool every day notices a pattern shift before any dashboard does. The firms that catch drift early treat that complaint as a structured signal worth investigating, not a moan to manage. The firms that miss it do not.

When to act, and when to ignore

The action depends on which flavour of drift you have. Fine-tuned drift, where you trained the model on your own data, is fixed by retraining on fresher data. You own the model, you own the fix. Foundation-version drift, where your vendor swapped the model underneath you, calls for a different lever: pin the version where the API allows it, test the new version before letting it through, and revisit the prompt.

Ignore the deeper machinery. Population Stability Index, Kolmogorov-Smirnov tests, and Jensen-Shannon divergence are real metrics, used in regulated credit risk and insurance modelling. For a 10-person agency or a 30-person professional services firm, they are overkill. The proportionate response is an inventory of your AI tools, a risk-rank by what each one decides, a pinned model version where you can set one, a small golden-dataset test you can re-run before any vendor upgrade, and a 30-minute monthly spot-check of recent outputs. No MLOps platform required.

The line worth drawing is between systems that drive a regulated or material decision and systems that do not. A lending or pricing model carrying a six-figure decision warrants formal monitoring, documented thresholds, and a named owner. A proposal-writing assistant warrants the monthly spot-check and a pinned version, no more. Match the governance to the consequence, then spend your effort where it earns its keep.

Concept drift is the textbook flavour where the relationship between inputs and outputs has changed. The world has moved, your model has not, and the patterns it learned no longer fit. It is the version that broke credit-scoring models after the 2020 to 2022 macroeconomic shifts.

Data drift is the version where the inputs look different even if the rules have not. New customer demographics, new product lines, or a sloppier CRM data feed are all common causes. The model is still doing the right thing, it is just being shown a population it was not trained for.

Foundation-model version drift is the 2026 flavour. Your vendor releases a new model version, your prompt or fine-tuned wrapper sits on top, and behaviour shifts without anyone changing your code. It is distinct from concept and data drift because the model itself has changed, not the world or the data.

Hallucination is a different failure mode. A hallucinating tool invents content not grounded in your data. A drifting tool produces content that has changed in quality or behaviour over time. Retrieval-augmented generation reduces hallucination. It does not address drift.

Fine-tuning and foundation models are the two surfaces drift sits on. If you fine-tuned the model, retraining is the lever. If you sit on a foundation model you did not commission, version-pinning, testing, and prompt revision are the levers.

The honest test of any AI tool you have been running for more than six months is the regression check. Take a small set of inputs you know the right answer to, run them through the current version, and compare to the answers it gave when you signed it off. If the answers have shifted in ways you cannot explain, you have drift.

Sources

Bank of England (2023). Supervisory Statement SS1/23, Model Risk Management Principles for Banks. Effective 17 May 2024, sets the model performance monitoring expectation referenced in the post. https://www.bankofengland.co.uk/prudential-regulation/publication/2023/may/model-risk-management Information Commissioner's Office (2024). Guidance on AI and data protection. Confirms UK GDPR Article 5(1)(d) accuracy principle applies to model drift. https://ico.org.uk/for-organisations/uk-gdpr/guidance-index/ai-and-data-protection/ Financial Conduct Authority and Bank of England (2024). Artificial intelligence in financial services, joint survey of 85 firms. Source for the 64 per cent monitoring and 18 per cent retraining-policy stats. https://www.fca.org.uk/news/news-stories/fca-bank-of-england-ai-survey-2024 European Union (2024). Regulation (EU) 2024/1689, the EU AI Act, Articles 15 and 17 on accuracy, robustness, and quality management. https://eur-lex.europa.eu/eli/reg/2024/1689/oj Zillow Group (2021). iBuyer wind-down disclosures, US$304m inventory write-down, November 2021. https://www.zillow.com/research/zillow-ibuyer-writedown-2021/ Anthropic (2026). Claude model release notes. Reference for the Claude 3 to 4.7 release cadence cited in the foundation-model version drift section. https://www.anthropic.com/news OpenAI (2026). API model deprecation schedule. Reference for vendor sunset notices cited in the "where you will meet it" section. https://platform.openai.com/docs/deprecations SAS Institute. Population Stability Index methodology and threshold guidance, the formal metric named in passing for readers who want to know one exists. https://www.sas.com/content/dam/SASWeb/en_us/doc/whitepaper/psi-stability-index.pdf UK Government (2020). Ofqual A-level and GCSE algorithm withdrawal, August 2020. Named UK incident anchor for algorithmic decision-making failure. https://www.gov.uk/government/organisations/ofqual/about

Frequently asked questions

How is model drift different from hallucination?

They are different failure modes with different fixes. Hallucination is when an AI tool invents content that is not grounded in your data. Drift is when a tool that used to work accurately starts working less well, either because the world has shifted or because the underlying model has been updated. Retrieval-augmented generation reduces hallucination. Drift needs monitoring, version-pinning, and retraining.

Do I need a formal MLOps platform to manage drift?

For the typical owner-led business, no. A 10-person services firm with a few prompt-engineered tools does not need Evidently AI or Fiddler. A pinned model version, a small golden-dataset test you can re-run, and a 30-minute monthly spot-check of outputs covers the practical risk. Reserve formal monitoring tooling for systems that drive regulated decisions or material financial outcomes.

Does PRA SS1/23 apply to my business?

Only if you are a bank or large building society regulated by the PRA. SS1/23 is binding on those firms. For everyone else it is reference material, not duty. The cross-sector hook is the ICO's accuracy principle under UK GDPR Article 5(1)(d), which applies to anyone using personal data in automated decisions. If you sell into the EU, the EU AI Act adds further requirements for high-risk systems.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation