A simple review process for catching AI mistakes

Person reviewing printed documents at a wooden desk, making handwritten notes
TL;DR

Large language models produce confident output that may contain invented facts, wrong formulas, or fabricated references. A simple review process covering accuracy, completeness, compliance, and security is both good practice and a legal expectation for UK firms under ICO guidance and FCA rules. Classify your AI outputs by risk level, use a checklist, log what you check, and make sure the person reviewing has the expertise to challenge what they find.

Key takeaways

- AI tools generate text that sounds plausible but may contain invented facts, fabricated references, or misapplied data; treat every output as a first draft until a human has checked the key claims against source documents or authoritative references. - The UK ICO requires meaningful human review for AI-assisted decisions, meaning the reviewer must have the authority and expertise to change the outcome, not simply approve it. - Classify AI outputs by risk level: high-risk work such as client contracts, regulated advice, and financial statements needs senior review before leaving the firm; lower-risk work needs at least a self-check checklist. - Log every high-risk review in a simple spreadsheet recording the date, tool used, reviewer, and any issue found. This record protects you if a client or regulator asks questions later. - Review processes fail when the reviewer lacks domain expertise, when time pressure leads to rubber-stamping, or when firms assume AI error patterns are stable and stop monitoring over time.

A finance manager at a small professional services firm submitted a client report last month. The AI tool had drafted it cleanly, the language was polished, the figures looked plausible. She sent it without reading past the first page. Two days later, the client called. One of the contractual references was wrong, and a formula in the spreadsheet appendix had applied data from the wrong column. The mistake had been invisible because it looked right.

This is the specific failure mode AI creates: confident-sounding output that contains genuine errors. Plenty of small businesses have no systematic way of catching them before they leave the building.

What makes AI output risky without a review step?

Large language models generate text that sounds plausible, not verified facts. They hallucinate: inventing clauses in contracts, fabricating references, misreading spreadsheet logic. The model has no access to your firm’s policies, your client records, or current law. It predicts what sounds right. That means every output is a first draft, and treating it otherwise is where things go wrong.

One finance professional documented an AI tool that applied the wrong data validation across spreadsheet cells and then failed to catch its own mistake when asked to review. The error only surfaced when a human checked the output line by line.

The pattern is consistent across use cases. AI models are trained to predict the next plausible word, not to verify whether a contractual clause exists in law, whether a figure matches your records, or whether a regulatory reference is still current. That gap between plausible and accurate is where errors live. Because the output reads confidently, the errors tend to travel further than they should.

The UK ICO’s AI audit framework is explicit: human review of AI decisions must be meaningful, not mechanical. If someone approves AI output without the authority or competence to change it, that doesn’t satisfy the requirement. UK GDPR Article 22 gives individuals the right not to be subject to a solely automated decision with legal or similarly significant effects.

For financial services firms, the FCA’s Consumer Duty rules require that AI-assisted recommendations and communications produce good outcomes for retail customers and that the process generating those outcomes is auditable. The FCA’s own research on large language models, published in May 2025, concluded that validating AI output requires both human judgement and automated tools on an ongoing basis.

Beyond regulated sectors, the principle holds. Many firms currently review around 3% of AI-assisted interactions for quality. Aveni, which analyses AI governance in financial advice, argues that sampling rate is insufficient when AI can scale errors across thousands of interactions. Even firms outside regulated industries should document what they check, who checks it, and what they do when they find a problem.

Where in your workflow do errors actually surface?

Errors tend to cluster around the same use cases in small professional services firms: documents with factual or legal content, structured data analysis, and anything referencing regulations or external standards. A contract clause invented by the AI looks identical to a real one in the draft. A formula misapplied in a spreadsheet produces numbers that look plausible until someone checks the source data.

The risk isn’t evenly distributed. Internal brainstorming, early-stage idea lists, and lightly structured internal memos carry low stakes if they contain an error. A client proposal, an employment letter, or a compliance report carries high stakes. The EU AI Act uses exactly this kind of risk classification for AI systems affecting credit, hiring, or access to services. You can mirror that logic internally with a simple three-band approach.

High-risk outputs go to a senior decision-maker for review before they leave the firm. Medium-risk work, such as marketing copy or non-regulated proposals, gets a peer check. Low-risk internal work gets a self-check against a brief checklist. A rough three-band ranking by risk level is all you need to get started.

What does a practical review process look like for a small firm?

A workable process for a small firm has three parts: a review checklist, a log, and a clear escalation rule. The ICO explicitly recommends standardised checklists and simple documented procedures, partly to reduce automation bias, the tendency for reviewers to trust plausible-looking output rather than challenge it. The same guidance suggests building in a fallback when AI output falls below an acceptable quality threshold.

The checklist covers five areas. Accuracy: does the output match the source documents or data you provided? Any regulatory references need verifying directly against legislation.gov.uk, the ICO website, or the FCA register. Completeness: has the AI assumed information you didn’t provide? Compliance: does the output align with your existing policies? Bias: does the output treat similar customers differently without a legal basis? Security: has any client data been passed to an external AI tool without a proper risk assessment? The NCSC’s guidance on secure AI systems is clear that unmanaged third-party access to sensitive data warrants its own risk assessment, regardless of how clean the final output looks.

Log every high-risk review in a simple spreadsheet: date, document, tool used, reviewer, and any issue found. The FCA’s review of automated financial advice found that undocumented human interventions created serious evidential gaps when compliance questions arose later. A basic log closes that gap at minimal cost.

If you find more than two significant issues in a week for the same task type, pause and revisit your prompting approach. Errors tend to repeat, and a pattern in your log is a signal worth acting on.

What makes a review process fail?

The most common failure is running a review that doesn’t work. A junior team member checking complex regulatory content they can’t evaluate. A reviewer under time pressure who approves quickly to keep the project moving. A firm that checks AI output once and assumes the model behaves the same way three months later, when prompts, data, or the underlying model have all shifted.

The ICO is specific: human review must be carried out by someone with appropriate authority and competence to override the AI decision, not simply to sign off on it. If the person reviewing an AI-generated employment letter has no knowledge of employment law, the review provides no meaningful protection.

The NIST AI Risk Management Framework addresses the drift problem directly. It recommends continuous monitoring and recording of inputs and outputs so that if error patterns shift, you catch them early. A monthly 30-minute review of your AI logs to look for trends costs almost nothing and catches problems before they scale.

There is also the confidentiality question. Reviewing final text for accuracy won’t address a data protection breach if client information was pasted into an unmanaged external AI tool earlier in the process. Review and access control need to happen together.

If your review process is perceived as slow or bureaucratic, staff will bypass it. The Cabinet Office’s Mitigating Hidden AI Risks toolkit documents this pattern in government AI deployments: under pressure, teams cut corners on oversight. The answer is keeping the process genuinely lightweight, not demanding perfect compliance with a heavy one.

The review doesn’t have to be long. A one-page checklist and a short log spreadsheet are enough for a small firm to document meaningful human oversight in a way that satisfies the ICO, the FCA, and your own professional standards. The point is to make it real rather than nominal. If the finance manager in the opening had a checklist on her desk, the client wouldn’t have called.

If you’d like to work through what this looks like for your firm, Book a conversation.

Sources

- ICO (2024). Artificial intelligence audit framework: human review toolkit. ICO guidance on meaningful human oversight of AI decisions, covering UK GDPR Articles 5, 13, 14, 15, and 22 requirements for review authority and competence. https://ico.org.uk/for-organisations/advice-and-services/audits/data-protection-audit-framework/toolkits/artificial-intelligence/human-review/ - FCA (2025). Our approach to AI: speech on large language models in financial services. Covers the requirement for both human judgement and automated tools to validate AI outputs on an ongoing basis in regulated contexts. https://www.fca.org.uk/news/speeches/our-approach-ai - ICO (2024). AI and data protection. Guidance on lawful, fair, and transparent AI use under UK GDPR, including documentation requirements, DPIAs, and logging of human overrides. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/data-protection-and-ai/ai-and-data-protection/ - NIST (2023). AI Risk Management Framework (AI RMF 1.0). US framework widely referenced by UK risk teams, covering continuous monitoring, input/output logging, and human-in-the-loop review as core controls. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf - UK Cabinet Office (2023). The Mitigating Hidden AI Risks toolkit. Documents automation bias, over-trust in AI, and lack of fall-back processes as key risk drivers in UK public-sector AI deployments, with structured mitigations. https://www.gov.uk/government/publications/a-human-centred-approach-to-scaling-and-de-risking-ai-tools/the-mitigating-hidden-ai-risks-toolkit-html - NCSC (2023). Guidelines for secure AI system development. UK cyber authority guidance on logging, access control, and secure use of AI tools, including risks of exposing sensitive data to third-party AI services. https://www.ncsc.gov.uk/collection/guidelines-for-secure-ai-system-development - FCA (2023). Consumer Duty. Regulatory requirement for firms to deliver and monitor good outcomes for retail customers, including where AI is used to generate recommendations or communications. https://www.fca.org.uk/firms/consumer-duty - Aveni (2024). AI governance in financial services. Analysis of oversight gaps in AI-enabled financial advice, including the inadequacy of 3% sampling as a sole oversight control when AI scales errors across large volumes. https://aveni.ai/blog/ai-governance-financial-services/ - GDPR Local (2026). AI compliance guide for UK companies. Recommended timeline and approach for UK SME AI governance programmes, including AI inventory, gap analysis, staff training, and ongoing monitoring. https://gdprlocal.com/ai-compliance-uk-companies/ - Financial Professionals International (2024). How to catch AI errors before they become business problems. Practitioner accounts of AI hallucination in finance, including spreadsheet formula errors and contract clause fabrication. https://www.financialprofessionals.org/training-resources/resources/articles/Details/how-to-catch-ai-errors-before-they-become-business-problems

Frequently asked questions

How often should I review AI output for errors?

The frequency depends on risk level. High-risk outputs such as client contracts, regulated advice, or financial statements should be reviewed before every use by someone with the expertise to challenge what they find. For lower-risk work, sampling is acceptable, though the FCA cautions that reviewing only around 3% of interactions is insufficient when AI can scale errors across large volumes.

Does my small business need a formal AI review process?

If you're using AI to produce content that goes to clients, touches regulated areas, or influences decisions affecting individuals, yes. The ICO expects documented human review procedures, and the FCA's Consumer Duty rules apply to AI-assisted communications regardless of firm size. The process doesn't need to be complicated: a one-page checklist and a simple log spreadsheet will meet the core requirement.

What happens if my AI review process misses an error and a client is affected?

The consequences depend on the context. A factual error in a client contract could lead to a dispute. A compliance failure in a regulated output could attract regulatory scrutiny. The FCA's review of automated financial advice found that firms unable to evidence their human review process faced serious difficulties during examinations. Keeping a log of what was checked, by whom, and when is your most reliable protection.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation