The ninety-day reflective audit on AI recommendations

The owner I am thinking of is three months into heavier AI use across her team. Her marketing lead is using it to draft and segment, her operations manager is using it on supplier comparisons, her finance person is using it for first-pass analysis on monthly numbers. Things feel faster. A few decisions have clearly landed well. One or two have quietly cost money. She cannot tell you which is which without scrolling back through her calendar and her inbox, and even then she is not sure she would be honest with herself about it. The hardest AI evaluation move is the one that takes ninety days to perform, and almost nobody runs it. This is the reflective audit, in two hours, with the team you already have.

What is the ninety-day reflective audit?

The ninety-day reflective audit is a structured two-hour conversation in which the owner and one or two senior people list every material AI-influenced decision of the last quarter, score the outcome honestly, and decide what to keep, retire, or run with heavier review. Think of it as a recalibration of the team’s discipline against the decisions it has actually made, sitting alongside but separate from a programme review or a post-mortem.

The output is one page. Three columns of decisions with outcome codes, and a short list of category-level decisions for the next quarter. No software, no separate report, no consultants. The audit sits inside the firm’s existing quarterly rhythm or, where one does not exist, becomes the seed of one. Owners who run it land in the next quarter with a sharper sense of where AI is helping and where it is quietly costing them. The BCG industry baseline puts the failure rate of AI pilots that lack this kind of measurement discipline at around 70 per cent, and the audit’s job is to keep the firm out of that pool.

Why does it matter for your business?

It matters because the cost of a bad AI-influenced decision in an owner-operated firm rarely shows up as a single visible loss. It shows up as drift. Margin slips a few points across a quarter, a recruitment round produces a hire who does not stick, a pricing change quietly underperforms, an analysis built on hallucinated data shapes an April strategy meeting you only question in July. None of it gets caught by the dashboard.

Enterprises tolerate this drift because they have scale to absorb it. A fifty-person agency cannot. The audit’s commercial logic is asymmetric. The cost of running it is two hours of senior time once a quarter. The cost of not running it is the slow accumulation of bad calls that nobody flagged because nobody looked. Harvard Business Review’s 2026 work on the return on AI investments points to the same conclusion from a different angle. The firms that tie AI use to business outcomes scale impact faster than the firms that simply tie it to tool usage.

Where will you actually meet it?

You will meet the audit on a Thursday afternoon in a meeting room with the owner, one or two senior people, a laptop or a flip chart, and ninety days of AI-influenced decisions to walk through. The first twenty minutes are spent listing every material decision the team accepted with AI input. A typical SME running AI across marketing, operations, sales, and finance for the quarter produces fifteen to thirty items on that list.

A material decision is one that shaped work, budget, strategy, or what went out to customers. A one-off AI brainstorm for three subject lines does not make the cut. A repricing recommendation that changed margins across a product category does. A candidate shortlist ranked by an AI hiring tool does. A monthly financial commentary that landed in the board pack does.

The next sixty minutes work through each decision. The person closest to it names the outcome. Did it work out, did it cost time or money to fix, would the team have made the same call without the AI input? Green for net positive, amber for net neutral, red for net negative. Then two short codes per decision. AI confidence at the time (high, moderate, low) and the team’s review effort (rigorous, spot-check, near-automatic). The last twenty minutes are pattern reading. Where do the greens cluster? Where do the reds cluster? Which category combines low AI confidence and near-automatic acceptance, and what outcomes did that combination produce? The University of Washington’s 2025 study on hiring AI showed that humans mirror AI bias when review effort is light, and the audit makes that pattern visible in your own data rather than the academic kind.

When to ask vs when to ignore

The audit asks three questions per category of recommendation, and the answer to each is binary. First, are the greens clustered enough to keep accepting these recommendations with the current level of review? Second, are the reds clustered enough to retire the AI in this category? Third, is the picture mixed enough that the right answer is to keep using the AI but raise the review effort?

If the greens cluster, continue and document the discipline. If the reds cluster, retire the tool for that category. Retiring is not a verdict on AI as a whole, only on the fit between this tool and this task at this firm. The MIT Sloan work on AI-generated hiring rankings is a good reminder that retiring a tool is sometimes the discipline-led answer, not a failure. If the picture is mixed, the next quarter runs that category at rigorous review rather than spot-check, and the next audit tells you whether the heavier review was enough.

Ignore the audit only if your firm has not yet been using AI for ninety days. Run it light if the team is small enough that the owner already sees every decision. Skip it entirely if you genuinely do not care whether the AI is paying off, but at that point the question is not the audit, it is why you bought the tools. The Information Commissioner’s Office guide to AI audits sets the regulatory floor for firms whose AI use touches personal data, and the reflective audit naturally sits on top of that floor rather than replacing it.

The ninety-day reflective audit sits inside a small family of evaluation moves that an owner-operated firm runs at different cadences. The two-question evaluation method runs at the point of acceptance, the day three tool review runs at the start of an adoption, and the twelve-month programme review runs at the end of a cycle. The reflective audit is the quarterly bridge between them.

The post on owner-operated AI evaluation explains the four standing exposures the audit is mostly looking at, confidently wrong content, fabricated numbers, recommendations dressed as facts, and silent drift over time.

Two governance frameworks are worth knowing as you mature the audit. The NIST AI Risk Management Framework gives a vocabulary for talking about AI risk at any size of firm, and the OECD’s 2025 work on AI adoption by SMEs grounds the ninety-day cadence in the quarterly rhythm that owner-operated businesses already run. Neither replaces the audit. They give the audit a wider context if and when you need to explain the discipline to a client, a partner, or a regulator. If you have been making AI-influenced decisions for the last quarter and you have never sat down to look back, the next two hours are the highest-value time you will spend on AI all year. Book a conversation if you would like a second pair of eyes on the first one.

The ninety-day reflective audit on AI recommendations

Key takeaways

What is the ninety-day reflective audit?

Why does it matter for your business?

Where will you actually meet it?

When to ask vs when to ignore

Sources

Frequently asked questions

When should an SME run its first reflective audit on AI recommendations?

How honest does the scoring need to be for the audit to work?

What if the audit reveals that an AI tool has been net negative across most of its uses?

Ready to talk it through?

If any of this sounds familiar, let's talk.

The ninety-day reflective audit on AI recommendations

Key takeaways

What is the ninety-day reflective audit?

Why does it matter for your business?

Where will you actually meet it?

When to ask vs when to ignore

Related concepts

Sources

Frequently asked questions

When should an SME run its first reflective audit on AI recommendations?

How honest does the scoring need to be for the audit to work?

What if the audit reveals that an AI tool has been net negative across most of its uses?

Ready to talk it through?

Related reading

Quality signals over time, how to spot when AI output is drifting

The two-person review threshold, when single-check AI evaluation is not enough

Sampling rates for AI output, what the volume should drive

If any of this sounds familiar, let's talk.