Twelve-month AI review: keep it, fix it, or kill it

The renewal notice arrives by email. Twelve months of licence fees, automatic renewal in thirty days, and somewhere in the inbox a vague awareness that this should probably be looked at before it charges again.

The tool is embedded in several workflows. People are broadly using it. Nobody has stopped to ask whether it’s earning what it costs.

For the delegate who owns the AI mandate, this is more than an admin task. A year in, the business deserves a real decision, not a shrug and not a habit renewal.

What does the twelve-month AI review actually cover?

The twelve-month AI review is a structured decision meeting at the one-year mark after an AI tool went live. It examines four dimensions. How well the tool was adopted, whether the output quality was acceptable, what happened to working hours, and what actually happened to the financials. It always ends with a written decision from four choices. Continue the tool, expand it, contract the scope, or kill it.

The review draws from the post-implementation review framework used in formal project management, adapted for a single AI tool at owner-managed business scale. The core questions are the same. What was the plan, what actually happened, why was there a variance, and what should we do now? Applied to an AI deployment, those questions produce a clear picture. The difference is that post-implementation reviews on software tools are often skipped entirely; this review makes the skip a deliberate choice rather than an oversight.

This review is distinct from a post-mortem on a failed pilot. A failed pilot is an unplanned stop, typically made under pressure when results disappoint in the first few weeks. A twelve-month review is a scheduled assessment of a tool that has been running with real users in real workflows. The scope is narrow and defined. One tool, twelve months, four questions, one decision.

Why does this review matter for your business?

Without this review, renewal decisions default to habit rather than evidence. Industry surveys consistently show that between 60% and 75% of owner-managed businesses that buy AI tools cannot reliably quantify financial impact within twelve months of going live. A tool that has survived a year on goodwill and familiarity may or may not be earning what it costs. The review forces the question before inertia answers it for you.

The alternative is common enough to be worth naming. A tool gets renewed because people are used to it. Budget gets allocated because it was allocated before. Two years in, the business is paying for three AI tools, one of which has never been seriously used and two of which are doing overlapping work. A twelve-month review breaks that pattern before it compounds.

Accountability is the second reason this review matters for anyone holding an AI mandate. If the delegate is the person who championed a tool, the twelve-month review is the natural moment to demonstrate that the investment was sound, or to show that the decision to keep it, change it, or stop it is grounded in evidence rather than defensiveness. That discipline is what separates a credible AI lead from a tool buyer.

What do the four review questions actually test?

Four dimensions together give you a defensible picture of what a year with the tool produced. Adoption tells you whether people actually used it and at what rate. Quality tells you whether the output was good enough to trust. Time tells you whether hours-saved materialised and held up under measurement. Financial tells you whether that time saving reached the bottom line or was absorbed elsewhere.

Adoption is the easiest to measure but often the most revealing. If intended users are at 50% or below after twelve months, something is wrong that usage data alone cannot explain. The question is whether low adoption reflects a tool limitation, a training gap, or a workflow mismatch, because each has a different fix.

Quality requires an honest look at output samples. Research on AI productivity effects from Stanford’s Digital Economy Lab consistently finds that users overestimate the quality of AI output because it looks plausible and they do not scrutinise it as they would their own work. A twelve-month review is the right moment to assess a sample against a defined standard, rather than relying on the absence of complaints as a proxy for acceptability.

Time and financial impact are the two questions that many reviews handle least rigorously. Time-saved reported from memory is unreliable; studies on human time-estimation find that people are poor judges of their own time allocation. Where possible, triangulate using actual output volumes, any time-log data captured during the year, and a clear account of where the freed-up hours went. Financial impact is only visible if someone tracked where freed-up capacity was redirected. In owner-managed professional services firms, it commonly disappears into expanded workloads or improved service quality rather than direct cost reduction, which is still a benefit worth naming.

When does the sunk-cost check change the answer?

The sunk-cost check is a single question, stated plainly before anyone starts interpreting data. Knowing what you now know, would you buy this tool again today? The question separates the decision about the future from the money already spent on the past. If the answer is no, that is the finding, regardless of how long the tool has been running.

The sunk-cost question is hardest to apply when the person leading the review is also the person who championed the original purchase. That is why the review should include one person who was not involved in the original decision. An operations director who owns the mandate, a finance manager who can speak to the numbers, and an independent voice who can ask the questions the other two will tend to avoid. This is not about blame. It is about making sure the review produces a finding rather than a rationalisation.

Research on technology project outcomes from the Standish Group’s annual CHAOS report finds that approximately a third of technology projects deliver the projected benefits, roughly half deliver substantially less, and a material minority are effectively unused. Owner-managed businesses that have AI tools in the ‘substantially less’ category often keep them running past twelve months because stopping feels like admitting the purchase was wrong. The sunk-cost question is what gives the review permission to call that clearly, and to free the budget for something that will perform better.

What does the decision look like written down?

The output of the review is a written decision with a rationale. The decision takes one of four forms. Continue at current scope, expand to new users or use cases, contract down to the scenarios where the data shows it earning its keep, or kill it. Writing it down matters because the next renewal should be a judgement made against a record, not a subscription that renews on assumption.

The rationale is as important as the decision. Continue and expand decisions benefit from a written success case, capturing what the tool delivered against the original plan and what the business will hold it to in the next twelve months. Contract and kill decisions benefit from a written record of why, so the same mistake is not repeated when a vendor’s pitch arrives with similar claims next year.

A delegate who can produce this record at the next board review, or when the founder asks why that tool is still on the budget, is in a measurably different position than one who cannot. The review takes two or three hours when the data is roughly in order. The written decision takes thirty minutes. That is a small investment for a clean conscience and a freed-up budget line, and it is the kind of discipline that earns the right to a bigger AI mandate next year.

The twelve-month AI review: keep it, fix it, or kill it

Key takeaways

What does the twelve-month AI review actually cover?

Why does this review matter for your business?

What do the four review questions actually test?

When does the sunk-cost check change the answer?

What does the decision look like written down?

Sources

Frequently asked questions

How do I know if the twelve-month review needs a time-study or if survey responses are good enough?

What if the review finds mixed results, good on adoption but poor on financial impact?

Do I need external help to run a twelve-month AI review?

Ready to talk it through?

If any of this sounds familiar, let's talk.

The twelve-month AI review: keep it, fix it, or kill it

Key takeaways

What does the twelve-month AI review actually cover?

Why does this review matter for your business?

What do the four review questions actually test?

When does the sunk-cost check change the answer?

What does the decision look like written down?

Sources

Frequently asked questions

How do I know if the twelve-month review needs a time-study or if survey responses are good enough?

What if the review finds mixed results, good on adoption but poor on financial impact?

Do I need external help to run a twelve-month AI review?

Ready to talk it through?

Related reading

Why the time AI saves never reaches the bottom line

Where AI pays back first in a professional services firm

Where AI pays back first on a construction project

If any of this sounds familiar, let's talk.