Where you sit on the AI ROI maturity ladder

Picture a managing partner I’ll call Mark. Forty-fee-earner firm, five-figure annual AI spend across Copilot and a document-review tool, both rolled out in the last twelve months. Last Tuesday his finance director asked him for the ROI on the AI rollout for the next board pack. He sat down to write the answer and noticed what he had. The operations lead’s recollection of last quarter. A vendor case study from a peer firm. A general feeling that adoption was decent. The board meeting is in nine days and that pile is what he has to defend.

He’s at Level 1 on a five-level ladder. Most firms are. The ladder is diagnostic, not a moral grading. Until a partner can locate themselves on it accurately, the question of how to climb is the wrong question to be asking.

What are the five levels actually?

The ladder is drawn from technology-investment maturity research, adapted to AI specifically. Level 1 is anecdotal: adoption tracked, impact not. Level 2 is hours-saved by survey, with a number that exists but lacks validation. Level 3 is defendable methodology: time-study, quality assessment, leakage tracked, documented protocol. Level 4 is decision-grade ROI with formal review. Level 5 is portfolio-wide governance across all technology investments.

Each level has its own characteristic sentence. A Level 1 firm says: “We implemented this six months ago and we think it’s working, but we haven’t formally measured the impact.” A Level 2 firm says: “We measure hours-saved monthly by asking users, and we know the numbers aren’t precise.” A Level 3 firm says: “We measured time-saved through a two-week time-study before and after deployment, with five to ten professionals. Our methodology has estimated error bars of plus or minus 15 to 20 percent, but the conclusion holds within that range.” A Level 4 firm has a quarterly cadence and a 12-month review with explicit go-forward decisions. A Level 5 firm runs the same discipline across every technology investment in the portfolio.

The levels describe measurement reality, not aspirational steps.

Where does most of the catalogue actually sit?

About 50 to 60 percent of UK SMEs that have bought AI sit at Level 1. They have the tool. They have adoption metrics. They don’t have impact metrics. The number who can answer “what did the AI deliver in pounds last quarter?” with anything more than impression and inference is small. The honest reading is that most owner-led firms work from numbers that would not survive forty minutes of CFO scrutiny.

Roughly 25 to 35 percent are at Level 2. The hours-saved survey has been done. The number sits in a cell on a spreadsheet somewhere. It is articulated when asked. It has not been validated by anyone, the methodology has not been written down, and if the CFO probed for an hour the number would dissolve. Most of the SMEs reporting AI ROI in vendor surveys are working from Level 2 numbers.

Approximately 10 to 15 percent are at Level 3. Time-study or activity-log measurement, rubric-based quality assessment, value leakage tracked. The methodology is documented. The error bars are explicit. This is the floor for board credibility.

Level 4 is rare, around 3 to 5 percent of SMEs. Level 5 is rarer still, under 1 percent, and most large enterprises do not reach it either.

That distribution is the base case, not a failure. The question is whether the firm wants to keep operating at the base case or move.

Why is Level 3 the credibility floor?

Levels 1 and 2 share a common feature: the measurement methodology was either absent or informal. A CFO who asks “how did you measure that?” gets either no answer or a description of a survey. Both produce the same outcome. The number gets discounted, the case for the AI weakens, and the next renewal happens on gut feel rather than evidence.

Level 3 is structurally different. The methodology is written down. It can be audited. Its error bars are explicit, which means the conclusions can be tested against them. A CFO asking “how did you measure that?” gets an answer that holds up: “two-week time-study before and after, blinded quality assessment on thirty samples, leakage tracked through a survey of where the freed-up hours went.” The CFO does not have to take the AI’s value on faith. They can see the firm has measured it carefully.

The CMM-derived research finding underneath this is concrete. Level 3 firms achieve roughly 20 to 30 percent higher ROI on technology investments than Level 1 firms. The gain comes from better selection, because Level 3 firms catch unsuitable technologies in pilot. It comes from faster correction when something is off. And it comes from board confidence that releases capital for the next investment when the case is real.

A firm climbing from Level 1 to Level 3 measures better. The downstream effect is better decisions about what to deploy and what to kill. The measurement discipline becomes the operating discipline.

What does the climb to Level 3 actually involve?

The work is concrete and modest. About 40 to 60 hours of internal effort, conducted over 12 to 18 months. That’s a partner morning a fortnight, with finance manager support. Most SMEs have the capacity if they have decided the discipline is worth having.

The components are five. First, a baseline measurement of current state for each AI use case before deployment: hours per task, error rates, customer satisfaction, the numbers that anchor everything else. Second, a structured time-study or activity-log methodology applied four to six weeks after deployment, with a defined sample of five to ten people across two weeks. Third, a rubric-based quality assessment on a stratified sample of around thirty items, blinded, scored by someone not involved in the AI procurement. Fourth, leakage tracking, asking where the freed-up hours went: cost reduction, work expansion, or slack. Fifth, written documentation of the methodology so the next review can replicate it.

None of this requires consulting support. None of it requires expensive tooling. What it requires is a decision that the firm will operate on measured evidence rather than impression. The hours-cost is real but bounded. The pay-off is access to the question Mark needed to answer last Tuesday: what did the spend deliver?

If you are sitting where Mark was, with a board pack to write and a pile of impressions to draw from, that is the diagnostic. The next twelve months can put you at Level 3, or they can leave you at Level 1 with a bigger AI estate and the same defensibility problem you had before.

If you’d like to talk through what climbing the ladder looks like for your firm specifically, book a conversation.

Where you sit on the AI ROI maturity ladder

Key takeaways

What are the five levels actually?

Where does most of the catalogue actually sit?

Why is Level 3 the credibility floor?

What does the climb to Level 3 actually involve?

Sources

Frequently asked questions

What is the AI ROI maturity ladder for SMEs?

Why does Level 3 matter for board defence of AI spend?

How long does it take to move from Level 1 to Level 3 on AI ROI measurement?

What ROI uplift do firms see at higher maturity levels?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Where you sit on the AI ROI maturity ladder

Key takeaways

What are the five levels actually?

Where does most of the catalogue actually sit?

Why is Level 3 the credibility floor?

What does the climb to Level 3 actually involve?

Sources

Frequently asked questions

What is the AI ROI maturity ladder for SMEs?

Why does Level 3 matter for board defence of AI spend?

How long does it take to move from Level 1 to Level 3 on AI ROI measurement?

What ROI uplift do firms see at higher maturity levels?

Ready to talk it through?

Related reading

AI in B2B SaaS and tech firms in 2026

AI in UK hospitality 2026: where the margin actually moves

AI in UK manufacturing in 2026: five use cases, six constraints, and Made Smarter as the route in

If any of this sounds familiar, let's talk.