What to track when an AI model is in production

Two people reviewing data on a laptop at a desk in a modern office
TL;DR

Once an AI model is running live in your business, it needs the same ongoing attention you would give any critical process. UK regulators including the ICO, FCA, and NCSC already expect firms to track accuracy, detect data drift, log decisions, and monitor for security anomalies. A lean monitoring dashboard covering six dimensions, from business outcomes and model performance to cost and compliance, is achievable for any SME and increasingly hard to avoid.

Key takeaways

- Once an AI model is in production, it can degrade silently; without structured monitoring, problems often go unnoticed until they cause harm. - UK regulators including the ICO, FCA, and NCSC treat AI monitoring as an extension of existing data protection, financial governance, and cybersecurity obligations. - A practical SME monitoring dashboard covers six areas: business outcomes, model performance, data quality and drift, human interaction signals, compliance and logging, and cost. - The gap between intended and actual human oversight is where many monitoring failures start; "someone always reviews it" is the plan, but not always the practice. - If a model's errors could cause regulatory, customer, or commercial damage before anyone notices, active monitoring is necessary; lighter manual audits may suffice for low-stakes internal tools.

A small professional services firm deployed an AI model to classify incoming client enquiries and route them to the right team member. It performed well in testing. Six months later, a routine conversation with a long-standing client revealed that several urgent queries had been arriving in a catch-all inbox instead of the sales team. The model’s routing logic had quietly degraded as the firm’s service offer evolved. Nobody had noticed because nobody was watching. There was no accuracy check, no alert, no log to review.

What does “running AI in production” actually mean?

“Running AI in production” means your model is doing real work with real consequences, not generating test outputs nobody acts on. A model is in production when its outputs influence live processes: routing enquiries, scoring leads, generating documents, flagging anomalies. Once it reaches that point, errors carry weight. The accuracy level that seemed fine in testing becomes a business risk if left unwatched.

The distinction matters because a model behaves differently once it encounters real data. In testing, inputs are controlled and outputs are checked. In production, the model handles edge cases you did not anticipate, user behaviour that was not in the training set, and a business context that may have shifted since the model was built.

A 2024 survey by OutSystems and IT Brief found that 91% of UK enterprises report moving AI projects into production, yet only 41% say more than half of those projects were successful. That gap sits squarely between deployment and monitoring. Many firms deploy a model and then treat it like a piece of static software, checking in only when something breaks visibly. AI models can degrade in ways that have no visible error message. Outputs drift, data distributions shift, and performance slides quietly. Structured monitoring is what separates the 41% from the 59%.

Why does tracking AI outputs matter for your business?

Tracking AI outputs matters because a model you cannot measure is a risk you cannot manage. The ICO, the FCA, and the NCSC already expect firms to monitor AI-assisted processes as part of their existing obligations on data protection, financial governance, and cybersecurity. Beyond compliance, monitoring is how you distinguish AI that is genuinely adding value from AI that has quietly stopped working as intended.

The UK Government’s 2024 guidance on AI implementation explicitly advises organisations to define their success measures and monitoring arrangements from the start of any AI project. The ONS’s 2023 analysis of UK firms found that businesses with stronger management practices were more likely to adopt AI and to track performance systematically.

The cost of not monitoring can be severe. The 2020 A-level grading algorithm in England shows what happens without oversight. Ofqual deployed a statistical model to replace exam results when Covid cancelled exams. The algorithm systematically downgraded pupils from disadvantaged backgrounds while benefiting those from schools with stronger historical results. No monitoring caught it. By the time the decision was reversed, hundreds of students had lost their university places. The system had no mechanism to check whether the model was behaving fairly once it was live.

What should your monitoring dashboard cover?

For an owner-managed firm, a practical monitoring dashboard covers six areas. Business outcomes show whether the AI is contributing to revenue, efficiency, or quality. Model performance reveals accuracy, error rates, and whether outputs are drifting. Data quality flags whether inputs are still representative of what the model was trained on. Human interaction signals, compliance logs, and cost complete the picture.

Business outcomes are the baseline: how much time is the model saving per month, and have conversion rates or error rates shifted since deployment? Track before-and-after figures tied to real cost lines.

Model performance means accuracy against known ground truth where that exists, plus error rates categorised by severity. For a customer-facing model, the staff override rate is a useful proxy: when staff start correcting AI outputs more frequently, accuracy has often already declined.

Data quality and drift is where many silent failures begin. If your customer mix has shifted, your products have changed, or seasonal patterns have moved, the model’s inputs may no longer match what it was trained on. A periodic check that input distributions look similar to your training period can catch this early.

Human interaction signals show how often staff override, question, or ignore AI recommendations. A rising override rate often signals a problem before any automated alert would.

Compliance and logging means keeping records of which model version made which decision, and on what inputs. The ICO requires firms to be able to explain AI-assisted decisions under UK GDPR. The EU AI Act requires audit trails for high-risk systems. Without logs, neither obligation can be met.

Cost means monthly API spend, vendor licences, and compute tracked against the business value delivered, so you can identify whether the model is still earning its keep.

When is a full dashboard genuinely necessary?

A full monitoring dashboard adds clear value when your AI model is making or influencing decisions that directly affect customers, revenue, or compliance. For a model that produces first-draft documents for a human to review and approve, a weekly spot-check of a random sample may be enough. The deciding factor is whether errors could cause damage before anyone notices them.

For a typical owner-operated services firm, a useful starting point is two questions. First: if this model gave a wrong answer 10% of the time, would anyone notice within a week? Second: could a sustained error cause a regulatory problem, a customer complaint, or a commercial loss?

If the answer to the second question is yes, active monitoring is worth setting up, even if it starts with a spreadsheet log and a monthly review. If your AI sits behind a human who always reviews outputs carefully, you may be able to run lighter oversight for a period.

The caveat is that “a human always reviews it” is often the plan rather than the reality. Staff find workarounds. Reviewers start trusting the model and check less carefully over time. Deltek’s 2026 research shows that only 12% of UK firms currently report significant measurable ROI from AI. Part of that gap starts with the distance between intended oversight and actual practice.

What else connects to this?

AI model monitoring sits within a broader discipline called MLOps, machine learning operations, which covers how models are built, deployed, retrained, and retired. For large language models the same field is sometimes called LLMOps. Drift detection is the practice of identifying when a model’s inputs or outputs diverge from its training conditions. Explainability covers the ability to show why a model reached a specific output.

The enterprise-grade MLOps stack shows the mature end of the monitoring curve: end-to-end pipelines with automated drift monitoring, model versioning, experiment tracking, and retraining cycles, as offered by integrators like Capgemini for their enterprise clients. For a typical owner-managed business, the entry point is a small set of manually reviewed metrics on a regular schedule, with clear ownership of who reviews them and what triggers escalation.

Two concepts from the mature end are worth knowing earlier than you might expect. Model versioning, recording which version of a model produced which output, becomes critical the first time a client questions a decision and you need to reconstruct it. Explainability becomes relevant the first time the ICO or an unhappy customer asks how the model made a specific call.

If you are working out where to start, Book a conversation and we can map your current AI deployment against the monitoring basics that matter most for your sector.

Sources

- Information Commissioner's Office (2023). Guidance on AI and data protection. ICO guidance covering accuracy, bias monitoring, and the UK GDPR accountability principle for AI-assisted processes. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ai-and-data-protection/ - Information Commissioner's Office (2020). Explaining decisions made with AI. ICO guidance on obligations to explain AI-assisted decisions and maintain audit trails. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/explaining-decisions-made-with-ai/ - Bank of England and Financial Conduct Authority (2022). Artificial intelligence and machine learning in financial services. Survey and discussion paper setting out governance, validation, and oversight expectations for AI models in financial services firms. https://www.bankofengland.co.uk/report/2022/artificial-intelligence-and-machine-learning-in-financial-services - UK Government (2024). Planning and preparing for artificial intelligence implementation. Government guidance advising organisations to define success measures and monitoring arrangements from the start of any AI project. https://www.gov.uk/guidance/planning-and-preparing-for-artificial-intelligence-implementation - Office for National Statistics (2025). Management practices and the adoption of technology and artificial intelligence in UK firms, 2023. ONS analysis linking stronger management practices with systematic performance tracking and AI adoption. https://www.ons.gov.uk/economy/economicoutputandproductivity/productivitymeasures/articles/managementpracticesandtheadoptionoftechnologyandartificialintelligenceinukfirms2023/2025-03-24 - National Cyber Security Centre (2024). Guidelines for secure AI system development. NCSC guidance covering monitoring of access logs, API usage patterns, and anomaly detection for deployed AI systems. https://www.ncsc.gov.uk/collection/guidelines-for-secure-ai-system-development - UK Parliament, House of Commons Library (2020). Awarding qualifications in summer 2020: research briefing CBP-8985. Parliamentary briefing on the 2020 A-level grading algorithm and the absence of oversight mechanisms during deployment. https://commonslibrary.parliament.uk/research-briefings/cbp-8985/ - EUR-Lex (2021). Proposal for a Regulation on harmonised rules on artificial intelligence (EU AI Act). Legislative text requiring post-market monitoring, incident logging, and audit trails for high-risk AI systems. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52021PC0206 - OutSystems and IT Brief (2024). UK enterprises move AI projects into production. Survey finding that 91% of UK enterprises report AI in production but only 41% say the majority of those projects succeed. https://itbrief.ie/story/uk-enterprises-move-ai-projects-into-production - Deltek (2026). UK firms move from AI experimentation to measurable results. Press release noting that 29% of UK project-based businesses now prioritise operationalising AI, and only 12% report significant measurable ROI. https://www.deltek.com/en/about/media-center/press-releases/2026/uk-firms-move-from-ai-experimentation-to-measurable-results

Frequently asked questions

How often should I check whether my AI model is still performing well?

For customer-facing or decision-influencing AI, weekly automated checks combined with monthly human reviews are a reasonable baseline. For AI that feeds into regulated processes, you need near-real-time alerts for anomalies and formal incident logging. For low-stakes internal tools where a human reviews every output, a monthly spot-check of a sample is often enough to catch significant deterioration early.

Do I need a dedicated MLOps platform to monitor production AI?

For an SME with one or two models, a dedicated platform is not necessary at the start. A spreadsheet log, a scheduled review process, and clear ownership of who checks performance are enough to get going. As your AI footprint grows, platforms such as Azure ML or AWS SageMaker add automation without requiring an in-house data science team.

What does the ICO expect from firms using AI to make decisions?

The ICO's guidance on AI and data protection requires firms to explain AI-assisted decisions to the individuals affected, demonstrate ongoing accuracy and fairness, and maintain audit trails under the UK GDPR accountability principle. For decisions with significant effects on individuals, meaningful human review is expected. The monitoring implication is that you need logs of what the model did, which version, and on what inputs.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation