Four proxy metrics when AI ROI is unmeasurable

Picture an operations director I’ll call Sunita. £8m turnover services firm. Their AI deployment touches client onboarding, contract review, and proposal generation simultaneously. The finance team cannot cleanly attribute margin impact to any one of them; the data infrastructure is not built for that level of attribution and would take a year to rebuild. The CFO has asked Sunita for an ROI number for the next quarterly board meeting. Sunita has six metrics, none of which speak directly to financial impact, and a meeting in two weeks where someone is going to ask whether to renew.

She does not have a clean financial ROI number, and she will not have one in the next two weeks. What she does have is access to four proxy metrics that, taken together, give a credible read on whether the deployment is working. The proxies are not as good as a clean financial figure. They are good enough to make a defensible call.

Why is some AI ROI structurally unmeasurable in pounds?

Several things resist clean financial attribution. AI that touches multiple processes simultaneously breaks attribution; the firm cannot isolate the financial impact of one process from another without a model it does not have. Data infrastructure built for partner-equity decisions, not process-level ROI tracking, hides the financial signal. Year-one numbers sitting on the J-curve dip understate what the deployment will eventually deliver.

In these cases, demanding financial ROI as the primary measure produces either nothing or a fabricated number. Neither is useful. Proxy metrics give the firm a defensible read that does not depend on the financial signal being clean.

The proxies are not a substitute for financial ROI when financial ROI is measurable. They are the right approach when it is not.

What are the four proxy metrics?

Adoption rate is the first. The proportion of eligible users who are actively using the tool, typically measured weekly. A target of 70 to 80 percent adoption among eligible users is the threshold for a healthy deployment. Below that, the tool is not embedded enough to be producing meaningful value.

Sustained use after 90 days is the second, and the strongest. The proportion of users who started with the tool in the initial pilot phase and are still actively using it in month four. This is the metric that filters honeymoon-period adoption from genuine embedding. A drop-off of 30 percent or less between month two and month four is healthy. A drop-off of 50 percent or more is a strong signal the tool is not providing value.

Voluntary expansion is the third. The number of users using the tool for new use cases beyond the original deployment scope. When users find their own additional applications for the tool without being prompted, the tool is producing value the team can identify. When voluntary expansion is zero, the tool has not crossed the threshold of perceived usefulness.

Complaint rate is the fourth. The volume of complaints, support tickets, or user feedback indicating the tool is creating friction. A rising complaint rate signals adoption pain that has not been resolved. A flat or falling complaint rate signals the tool has settled into the workflow.

If all four hold positive, financial ROI almost always follows. If any of the four turns negative, the firm should investigate before committing to continued spend.

What is the 90-day cliff and why does it matter?

In technology rollouts that ultimately fail, adoption typically drops 50 to 60 percent between month two and month four. The drop is so reliable across studies of SaaS adoption, technology rollouts, and AI deployments that sustained adoption past month four is, on its own, a strong proxy for value.

The reasoning is concrete. Initial adoption can be driven by enthusiasm, mandate from leadership, or honeymoon-period curiosity. None of these are durable. Sustained use through month four means the user has integrated the tool into their actual workflow and continues to find it useful when the initial enthusiasm fades. Users vote with their hands. They do not keep using a tool that does not help them.

This is why sustained adoption is the cleanest proxy of the four. Users can be asked to fill out satisfaction surveys positively, particularly if their manager championed the tool. They cannot be asked to keep using something that wastes their time. The behavioural signal is much harder to game than the survey signal.

For Sunita’s situation, the 90-day cliff diagnostic is concrete. If month-four adoption is 75 to 85 percent of month-two adoption, the tool is embedded. If it is 40 to 60 percent, the tool is in trouble regardless of what the satisfaction survey says.

How do you separate signal from noise in proxy data?

Two disciplines matter most. The first is aggregation level. Daily metrics are noisy from project scheduling, holiday patterns, and individual workload variation. Weekly aggregates smooth most of this noise. Monthly aggregates smooth almost all of it. For AI ROI proxy tracking, weekly is usually the right cadence; daily is too noisy and monthly is too slow to detect emerging problems.

The second is cohort tracking. Individual user variation is high and uncorrelated with whether the tool is working. Cohort tracking groups users by when they were trained or when they started using the tool, which surfaces the underlying signal. If the cohort trained in March is at 78 percent adoption in month four and the cohort trained in June is at 45 percent in month four, that is a signal worth investigating. The training programme changed, or the use case changed, or something else shifted between the two cohorts.

Tracked at the right aggregation and the right cohort granularity, the four proxies produce a clear story. Tracked at the wrong level, they produce noise that obscures the story.

When do you investigate?

Any of the four going negative should trigger investigation, ideally before the next quarterly review rather than after. Adoption falling below the agreed target is the earliest warning. Sustained-use drop-off approaching the 50 to 60 percent failure pattern signals the tool has not embedded. Voluntary expansion stalling at zero suggests the team cannot identify new value the tool produces. Complaint rate climbing tells you adoption pain has not been resolved.

The investigation does not need to be elaborate. A short qualitative round with five to ten users, asking what is working and what is not, surfaces most of the underlying issues. The diagnostic typically points to one of four things: training was not deep enough, the workflow was not redesigned around the tool, the use case was not as suitable as it looked at proposal, or the tool itself has limitations that need a different work-around.

The point of proxy metrics is that they let the firm catch the problem at month four rather than at month twelve. Catching it at month four leaves time to fix the surrounding work and stay on the J-curve. Catching it at month twelve means the renewal decision is being made on a deployment that has already failed.

If you are looking at an AI deployment where financial ROI is structurally hard to compute and you want to set up proxy metrics that give you a defensible read, book a conversation.

Four proxy metrics when AI financial ROI is unmeasurable

Key takeaways

Why is some AI ROI structurally unmeasurable in pounds?

What are the four proxy metrics?

What is the 90-day cliff and why does it matter?

How do you separate signal from noise in proxy data?

When do you investigate?

Sources

Frequently asked questions

What are proxy metrics for AI ROI?

Why is sustained use after 90 days the cleanest proxy?

When should I rely on proxy metrics rather than financial ROI?

What's the discipline for separating signal from noise in proxy data?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Four proxy metrics when AI financial ROI is unmeasurable

Key takeaways

Why is some AI ROI structurally unmeasurable in pounds?

What are the four proxy metrics?

What is the 90-day cliff and why does it matter?

How do you separate signal from noise in proxy data?

When do you investigate?

Sources

Frequently asked questions

What are proxy metrics for AI ROI?

Why is sustained use after 90 days the cleanest proxy?

When should I rely on proxy metrics rather than financial ROI?

What's the discipline for separating signal from noise in proxy data?

Ready to talk it through?

Related reading

AI in B2B SaaS and tech firms in 2026

AI in UK hospitality 2026: where the margin actually moves

AI in UK manufacturing in 2026: five use cases, six constraints, and Made Smarter as the route in

If any of this sounds familiar, let's talk.