A simple review process for catching AI mistakes

A finance manager at a small professional services firm submitted a client report last month. The AI tool had drafted it cleanly, the language was polished, the figures looked plausible. She sent it without reading past the first page. Two days later, the client called. One of the contractual references was wrong, and a formula in the spreadsheet appendix had applied data from the wrong column. The mistake had been invisible because it looked right.

AI produces confident-sounding output that contains genuine errors. That is the specific failure mode, and plenty of small businesses have no systematic way of catching it before it leaves the building.

What makes AI output risky without a review step?

Large language models generate text that sounds plausible, not verified facts. They hallucinate, inventing clauses in contracts, fabricating references, misreading spreadsheet logic. The model has no access to your firm’s policies, your client records, or current law. It predicts what sounds right. That means every output is a first draft, and treating it otherwise is where things go wrong.

One finance professional documented an AI tool that applied the wrong data validation across spreadsheet cells and then failed to catch its own mistake when asked to review. The error only surfaced when a human checked the output line by line.

The pattern is consistent across use cases. AI models are trained to predict the next plausible word, not to verify whether a contractual clause exists in law, whether a figure matches your records, or whether a regulatory reference is still current. That gap between plausible and accurate is where errors live. Because the output reads confidently, the errors tend to travel further than they should.

Why does your business face a legal obligation to check?

The UK ICO’s AI audit framework is clear that human review of AI decisions must be meaningful, not mechanical. If someone approves AI output without the authority or competence to change it, that doesn’t satisfy the requirement. UK GDPR Article 22 gives individuals the right not to be subject to a solely automated decision with legal or similarly significant effects.

For financial services firms, the FCA’s Consumer Duty rules require that AI-assisted recommendations and communications produce good outcomes for retail customers and that the process generating those outcomes is auditable. The FCA’s own research on large language models, published in May 2025, concluded that validating AI output requires both human judgement and automated tools on an ongoing basis.

Beyond regulated sectors, the principle holds. Many firms currently review around 3% of AI-assisted interactions for quality. Aveni, which analyses AI governance in financial advice, argues that sampling rate is insufficient when AI can scale errors across thousands of interactions. Even firms outside regulated industries should document what they check, who checks it, and what they do when they find a problem.

Where in your workflow do errors actually surface?

Errors tend to cluster around the same use cases in small professional services firms. Documents with factual or legal content, structured data analysis, and anything referencing regulations or external standards are where they show up most. A contract clause invented by the AI looks identical to a real one in the draft. A formula misapplied in a spreadsheet produces numbers that look plausible until someone checks the source data.

The risk isn’t evenly distributed. Internal brainstorming, early-stage idea lists, and lightly structured internal memos carry low stakes if they contain an error. A client proposal, an employment letter, or a compliance report carries high stakes. The EU AI Act uses exactly this kind of risk classification for AI systems affecting credit, hiring, or access to services. You can mirror that logic internally with a simple three-band approach.

High-risk outputs go to a senior decision-maker for review before they leave the firm. Medium-risk work, such as marketing copy or non-regulated proposals, gets a peer check. Low-risk internal work gets a self-check against a brief checklist. A rough three-band ranking by risk level is all you need to get started.

What does a practical review process look like for a small firm?

A workable process for a small firm has three parts, a review checklist, a log, and a clear escalation rule. The ICO explicitly recommends standardised checklists and simple documented procedures, partly to reduce automation bias, the tendency for reviewers to trust plausible-looking output rather than challenge it. The same guidance suggests building in a fallback when AI output falls below an acceptable quality threshold.

The checklist covers five areas. First, accuracy. Does the output match the source documents or data you provided? Any regulatory references need verifying directly against legislation.gov.uk, the ICO website, or the FCA register. Second, completeness. Has the AI assumed information you didn’t provide? Third, compliance. Does the output align with your existing policies? Fourth, bias. Does the output treat similar customers differently without a legal basis? Fifth, security. Has any client data been passed to an external AI tool without a proper risk assessment? The NCSC’s guidance on secure AI systems is clear that unmanaged third-party access to sensitive data warrants its own risk assessment, regardless of how clean the final output looks.

Log every high-risk review in a simple spreadsheet, recording the date, document, tool used, reviewer, and any issue found. The FCA’s review of automated financial advice found that undocumented human interventions created serious evidential gaps when compliance questions arose later. A basic log closes that gap at minimal cost.

If you find more than two significant issues in a week for the same task type, pause and revisit your prompting approach. Errors tend to repeat, and a pattern in your log is a signal worth acting on.

What makes a review process fail?

The most common failure is running a review that doesn’t work. A junior team member checking complex regulatory content they can’t evaluate. A reviewer under time pressure who approves quickly to keep the project moving. A firm that checks AI output once and assumes the model behaves the same way three months later, when prompts, data, or the underlying model have all shifted.

The ICO is specific on this. Human review must be carried out by someone with appropriate authority and competence to override the AI decision, not simply to sign off on it. If the person reviewing an AI-generated employment letter has no knowledge of employment law, the review provides no meaningful protection.

The NIST AI Risk Management Framework addresses the drift problem directly. It recommends continuous monitoring and recording of inputs and outputs so that if error patterns shift, you catch them early. A monthly 30-minute review of your AI logs to look for trends costs almost nothing and catches problems before they scale.

There is also the confidentiality question. Reviewing final text for accuracy won’t address a data protection breach if client information was pasted into an unmanaged external AI tool earlier in the process. Review and access control need to happen together.

If your review process is perceived as slow or bureaucratic, staff will bypass it. The Cabinet Office’s Mitigating Hidden AI Risks toolkit documents this pattern in government AI deployments. Under pressure, teams cut corners on oversight. The answer is keeping the process genuinely lightweight, not demanding perfect compliance with a heavy one.

The review doesn’t have to be long. A one-page checklist and a short log spreadsheet are enough for a small firm to document meaningful human oversight in a way that satisfies the ICO, the FCA, and your own professional standards. The point is to make it real rather than nominal. If the finance manager in the opening had a checklist on her desk, the client wouldn’t have called.

If you’d like to work through what this looks like for your firm, Book a conversation.

A simple review process for catching AI mistakes

Key takeaways

What makes AI output risky without a review step?

Why does your business face a legal obligation to check?

Where in your workflow do errors actually surface?

What does a practical review process look like for a small firm?

What makes a review process fail?

Sources

Frequently asked questions

How often should I review AI output for errors?

Does my small business need a formal AI review process?

What happens if my AI review process misses an error and a client is affected?

Ready to talk it through?

If any of this sounds familiar, let's talk.

A simple review process for catching AI mistakes

Key takeaways

What makes AI output risky without a review step?

Why does your business face a legal obligation to check?

Where in your workflow do errors actually surface?

What does a practical review process look like for a small firm?

What makes a review process fail?

Sources

Frequently asked questions

How often should I review AI output for errors?

Does my small business need a formal AI review process?

What happens if my AI review process misses an error and a client is affected?

Ready to talk it through?

Related reading

The pilot-to-scale valley of death, and how to cross it

The agentic AI demo that collapses at scale

Procurement, purchasing and sourcing: what UK business owners need to know

If any of this sounds familiar, let's talk.