The twelve-month AI review: keep it, fix it, or kill it

Person reviewing printed reports at an office desk in natural light
TL;DR

At twelve months, every AI tool in an owner-managed business deserves a formal review. The review asks four questions covering adoption, quality, time-saved, and financial impact against the original plan, then runs a single sunk-cost check. The output is a documented decision to continue, expand, contract, or kill the tool. Without it, tools renew on inertia, budgets accumulate dead weight, and the delegate who owns the AI mandate cannot demonstrate whether the investment was sound.

Key takeaways

- The twelve-month AI review is a formal decision meeting that produces a documented outcome, whether to continue, expand, contract, or kill the tool. - The sunk-cost question, 'knowing what you now know, would you buy this tool again today?', is the most important single test. - Four dimensions give a complete picture: adoption rate, output quality, time-saved under measurement, and financial impact on the bottom line. - Include one person in the review who had no involvement in the original purchase decision; they will ask the questions the champion avoids. - Write the decision and its rationale down so the next renewal is a judgement made against a record, not a subscription that renews unchecked.

The renewal notice arrives by email. Twelve months of licence fees, automatic renewal in thirty days, and somewhere in the inbox a vague awareness that this should probably be looked at before it charges again.

The tool is embedded in several workflows. People are broadly using it. Nobody has stopped to ask whether it’s earning what it costs.

For the delegate who owns the AI mandate, this is more than an admin task. A year in, the business deserves a real decision, not a shrug and not a habit renewal.

What does the twelve-month AI review actually cover?

The twelve-month AI review is a structured decision meeting at the one-year mark after an AI tool went live. It examines four dimensions. How well the tool was adopted, whether the output quality was acceptable, what happened to working hours, and what actually happened to the financials. It always ends with a written decision from four choices. Continue the tool, expand it, contract the scope, or kill it.

The review draws from the post-implementation review framework used in formal project management, adapted for a single AI tool at owner-managed business scale. The core questions are the same. What was the plan, what actually happened, why was there a variance, and what should we do now? Applied to an AI deployment, those questions produce a clear picture. The difference is that post-implementation reviews on software tools are often skipped entirely; this review makes the skip a deliberate choice rather than an oversight.

This review is distinct from a post-mortem on a failed pilot. A failed pilot is an unplanned stop, typically made under pressure when results disappoint in the first few weeks. A twelve-month review is a scheduled assessment of a tool that has been running with real users in real workflows. The scope is narrow and defined. One tool, twelve months, four questions, one decision.

Why does this review matter for your business?

Without this review, renewal decisions default to habit rather than evidence. Industry surveys consistently show that between 60% and 75% of owner-managed businesses that buy AI tools cannot reliably quantify financial impact within twelve months of going live. A tool that has survived a year on goodwill and familiarity may or may not be earning what it costs. The review forces the question before inertia answers it for you.

The alternative is common enough to be worth naming. A tool gets renewed because people are used to it. Budget gets allocated because it was allocated before. Two years in, the business is paying for three AI tools, one of which has never been seriously used and two of which are doing overlapping work. A twelve-month review breaks that pattern before it compounds.

Accountability is the second reason this review matters for anyone holding an AI mandate. If the delegate is the person who championed a tool, the twelve-month review is the natural moment to demonstrate that the investment was sound, or to show that the decision to keep it, change it, or stop it is grounded in evidence rather than defensiveness. That discipline is what separates a credible AI lead from a tool buyer.

What do the four review questions actually test?

Four dimensions together give you a defensible picture of what a year with the tool produced. Adoption tells you whether people actually used it and at what rate. Quality tells you whether the output was good enough to trust. Time tells you whether hours-saved materialised and held up under measurement. Financial tells you whether that time saving reached the bottom line or was absorbed elsewhere.

Adoption is the easiest to measure but often the most revealing. If intended users are at 50% or below after twelve months, something is wrong that usage data alone cannot explain. The question is whether low adoption reflects a tool limitation, a training gap, or a workflow mismatch, because each has a different fix.

Quality requires an honest look at output samples. Research on AI productivity effects from Stanford’s Digital Economy Lab consistently finds that users overestimate the quality of AI output because it looks plausible and they do not scrutinise it as they would their own work. A twelve-month review is the right moment to assess a sample against a defined standard, rather than relying on the absence of complaints as a proxy for acceptability.

Time and financial impact are the two questions that many reviews handle least rigorously. Time-saved reported from memory is unreliable; studies on human time-estimation find that people are poor judges of their own time allocation. Where possible, triangulate using actual output volumes, any time-log data captured during the year, and a clear account of where the freed-up hours went. Financial impact is only visible if someone tracked where freed-up capacity was redirected. In owner-managed professional services firms, it commonly disappears into expanded workloads or improved service quality rather than direct cost reduction, which is still a benefit worth naming.

When does the sunk-cost check change the answer?

The sunk-cost check is a single question, stated plainly before anyone starts interpreting data. Knowing what you now know, would you buy this tool again today? The question separates the decision about the future from the money already spent on the past. If the answer is no, that is the finding, regardless of how long the tool has been running.

The sunk-cost question is hardest to apply when the person leading the review is also the person who championed the original purchase. That is why the review should include one person who was not involved in the original decision. An operations director who owns the mandate, a finance manager who can speak to the numbers, and an independent voice who can ask the questions the other two will tend to avoid. This is not about blame. It is about making sure the review produces a finding rather than a rationalisation.

Research on technology project outcomes from the Standish Group’s annual CHAOS report finds that approximately a third of technology projects deliver the projected benefits, roughly half deliver substantially less, and a material minority are effectively unused. Owner-managed businesses that have AI tools in the ‘substantially less’ category often keep them running past twelve months because stopping feels like admitting the purchase was wrong. The sunk-cost question is what gives the review permission to call that clearly, and to free the budget for something that will perform better.

What does the decision look like written down?

The output of the review is a written decision with a rationale. The decision takes one of four forms. Continue at current scope, expand to new users or use cases, contract down to the scenarios where the data shows it earning its keep, or kill it. Writing it down matters because the next renewal should be a judgement made against a record, not a subscription that renews on assumption.

The rationale is as important as the decision. Continue and expand decisions benefit from a written success case, capturing what the tool delivered against the original plan and what the business will hold it to in the next twelve months. Contract and kill decisions benefit from a written record of why, so the same mistake is not repeated when a vendor’s pitch arrives with similar claims next year.

A delegate who can produce this record at the next board review, or when the founder asks why that tool is still on the budget, is in a measurably different position than one who cannot. The review takes two or three hours when the data is roughly in order. The written decision takes thirty minutes. That is a small investment for a clean conscience and a freed-up budget line, and it is the kind of discipline that earns the right to a bigger AI mandate next year.

Sources

- Brynjolfsson, E., Li, D., and Raymond, L. (2023). Generative AI at Work. NBER Working Paper 31161. Measures AI productivity effects in AI-assisted customer service; finds gains of 35-40% for routine tasks with significant variation by worker experience and task type. https://www.nber.org/papers/w31161 - McKinsey & Company (2024). The State of AI in 2024. Annual survey on AI adoption, measurement discipline, and return on investment patterns; documents the gap between adoption and rigorous value measurement across organisations. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai - Standish Group (2024). CHAOS Report 2024. Annual technology project outcomes survey; finds approximately 35% of projects deliver expected benefits, roughly 50% deliver substantially less, with sunk-cost reasoning identified as a key barrier to corrective action. https://www.standishgroup.com - Project Management Institute (2023). Pulse of the Profession 2023. Guidance on post-implementation review discipline and structured project evaluation frameworks; relevant to the PIR structure applied to AI deployments. https://www.pmi.org/learning/thought-leadership/pulse - ICAEW (2024). Technology and the Profession: AI in Accountancy Practice. Guidance on AI tool evaluation, adoption monitoring, and professional quality standards for owner-managed accountancy practices. https://www.icaew.com/technical/technology/artificial-intelligence - IDC (2023). IDC Business Value of AI. Analysis of AI ROI patterns across enterprise and owner-managed deployments; informs the realistic first-year ROI range for well-implemented deployments. https://www.idc.com/research/artificial-intelligence - Deloitte (2024). State of AI in the Enterprise. AI economics framework distinguishing cost reduction, revenue enhancement, and risk mitigation as ROI pathways; used to frame the financial dimension of the twelve-month review. https://www2.deloitte.com/us/en/insights/focus/cognitive-technologies/state-of-ai-and-intelligent-automation-in-business-survey.html - BCG (2023). The CEO's Guide to the AI Value Agenda. Framework for sequencing AI value realisation; emphasises that freed-up capacity must be deliberately redirected for financial impact to materialise. https://www.bcg.com/publications/2023/ceo-guide-to-ai-value-agenda - Stanford Human-Centred Artificial Intelligence (2024). AI Index Report 2024. Annual assessment of AI adoption, productivity effects, and measurement challenges across sectors; documents the pattern of users overestimating AI output quality. https://hai.stanford.edu/research/ai-index

Frequently asked questions

How do I know if the twelve-month review needs a time-study or if survey responses are good enough?

Survey responses on hours-saved are a starting point but not a measurement. Memory of time saved is unreliable, and people tend to overstate it when the tool was their idea. For a defensible number, use vendor usage logs, any output-volume data, and at minimum a two-week log from users in the final quarter. That combination beats a one-question survey almost every time.

What if the review finds mixed results, good on adoption but poor on financial impact?

That is a contract decision, not necessarily a kill. Identify the specific use cases where adoption is real and output is useful, and restrict the tool to those. If one function is getting clear value and another is not, the tool itself is not the problem. Stop paying for the low-value deployment and keep the one that is earning.

Do I need external help to run a twelve-month AI review?

The three people you need are the delegate who owns the tool, the finance manager who can speak to cost and margin, and one person who had no involvement in the original decision. External support is useful if you want validated benchmarks or an independent quality assessment, but it is not a prerequisite for a useful review.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation