Four proxy metrics when AI financial ROI is unmeasurable

An operations director at a desk with a laptop dashboard and notebook with handwritten cohort tracking in late afternoon light
TL;DR

Some AI deployments cannot be financially measured cleanly. Track adoption rate, sustained use after 90 days, voluntary expansion to new use cases, and complaint rate. If all four hold positive, financial ROI almost always follows even when it cannot be directly measured. Sustained adoption is the cleanest proxy because it is hardest to game.

Key takeaways

- Some AI deployments cannot be financially measured cleanly because they touch multiple processes, the data is messy, or year-one numbers sit on the dip. - The four proxy metrics are adoption rate, sustained use after 90 days, voluntary expansion to new use cases, and complaint rate. - The 90-day cliff is the strongest proxy. In failed rollouts, adoption typically drops 50 to 60 percent between month two and month four. - Sustained adoption is hardest to game because users vote with their hands, not with a survey response. - Aggregate to weekly or monthly, track cohorts not individuals, and investigate when any of the four turns negative.

Picture an operations director I’ll call Sunita. £8m turnover services firm. Their AI deployment touches client onboarding, contract review, and proposal generation simultaneously. The finance team cannot cleanly attribute margin impact to any one of them; the data infrastructure is not built for that level of attribution and would take a year to rebuild. The CFO has asked Sunita for an ROI number for the next quarterly board meeting. Sunita has six metrics, none of which speak directly to financial impact, and a meeting in two weeks where someone is going to ask whether to renew.

She does not have a clean financial ROI number, and she will not have one in the next two weeks. What she does have is access to four proxy metrics that, taken together, give a credible read on whether the deployment is working. The proxies are not as good as a clean financial figure. They are good enough to make a defensible call.

Why is some AI ROI structurally unmeasurable in pounds?

Several things resist clean financial attribution. AI that touches multiple processes simultaneously breaks attribution; the firm cannot isolate the financial impact of one process from another without a model it does not have. Data infrastructure built for partner-equity decisions, not process-level ROI tracking, hides the financial signal. Year-one numbers sitting on the J-curve dip understate what the deployment will eventually deliver.

In these cases, demanding financial ROI as the primary measure produces either nothing or a fabricated number. Neither is useful. Proxy metrics give the firm a defensible read that does not depend on the financial signal being clean.

The proxies are not a substitute for financial ROI when financial ROI is measurable. They are the right approach when it is not.

What are the four proxy metrics?

Adoption rate is the first. The proportion of eligible users who are actively using the tool, typically measured weekly. A target of 70 to 80 percent adoption among eligible users is the threshold for a healthy deployment. Below that, the tool is not embedded enough to be producing meaningful value.

Sustained use after 90 days is the second, and the strongest. The proportion of users who started with the tool in the initial pilot phase and are still actively using it in month four. This is the metric that filters honeymoon-period adoption from genuine embedding. A drop-off of 30 percent or less between month two and month four is healthy. A drop-off of 50 percent or more is a strong signal the tool is not providing value.

Voluntary expansion is the third. The number of users using the tool for new use cases beyond the original deployment scope. When users find their own additional applications for the tool without being prompted, the tool is producing value the team can identify. When voluntary expansion is zero, the tool has not crossed the threshold of perceived usefulness.

Complaint rate is the fourth. The volume of complaints, support tickets, or user feedback indicating the tool is creating friction. A rising complaint rate signals adoption pain that has not been resolved. A flat or falling complaint rate signals the tool has settled into the workflow.

If all four hold positive, financial ROI almost always follows. If any of the four turns negative, the firm should investigate before committing to continued spend.

What is the 90-day cliff and why does it matter?

In technology rollouts that ultimately fail, adoption typically drops 50 to 60 percent between month two and month four. The drop is so reliable across studies of SaaS adoption, technology rollouts, and AI deployments that sustained adoption past month four is, on its own, a strong proxy for value.

The reasoning is concrete. Initial adoption can be driven by enthusiasm, mandate from leadership, or honeymoon-period curiosity. None of these are durable. Sustained use through month four means the user has integrated the tool into their actual workflow and continues to find it useful when the initial enthusiasm fades. Users vote with their hands. They do not keep using a tool that does not help them.

This is why sustained adoption is the cleanest proxy of the four. Users can be asked to fill out satisfaction surveys positively, particularly if their manager championed the tool. They cannot be asked to keep using something that wastes their time. The behavioural signal is much harder to game than the survey signal.

For Sunita’s situation, the 90-day cliff diagnostic is concrete. If month-four adoption is 75 to 85 percent of month-two adoption, the tool is embedded. If it is 40 to 60 percent, the tool is in trouble regardless of what the satisfaction survey says.

How do you separate signal from noise in proxy data?

Two disciplines matter most. The first is aggregation level. Daily metrics are noisy from project scheduling, holiday patterns, and individual workload variation. Weekly aggregates smooth most of this noise. Monthly aggregates smooth almost all of it. For AI ROI proxy tracking, weekly is usually the right cadence; daily is too noisy and monthly is too slow to detect emerging problems.

The second is cohort tracking. Individual user variation is high and uncorrelated with whether the tool is working. Cohort tracking groups users by when they were trained or when they started using the tool, which surfaces the underlying signal. If the cohort trained in March is at 78 percent adoption in month four and the cohort trained in June is at 45 percent in month four, that is a signal worth investigating. The training programme changed, or the use case changed, or something else shifted between the two cohorts.

Tracked at the right aggregation and the right cohort granularity, the four proxies produce a clear story. Tracked at the wrong level, they produce noise that obscures the story.

When do you investigate?

Any of the four going negative should trigger investigation, ideally before the next quarterly review rather than after. Adoption falling below the agreed target is the earliest warning. Sustained-use drop-off approaching the 50 to 60 percent failure pattern signals the tool has not embedded. Voluntary expansion stalling at zero suggests the team cannot identify new value the tool produces. Complaint rate climbing tells you adoption pain has not been resolved.

The investigation does not need to be elaborate. A short qualitative round with five to ten users, asking what is working and what is not, surfaces most of the underlying issues. The diagnostic typically points to one of four things: training was not deep enough, the workflow was not redesigned around the tool, the use case was not as suitable as it looked at proposal, or the tool itself has limitations that need a different work-around.

The point of proxy metrics is that they let the firm catch the problem at month four rather than at month twelve. Catching it at month four leaves time to fix the surrounding work and stay on the J-curve. Catching it at month twelve means the renewal decision is being made on a deployment that has already failed.

If you are looking at an AI deployment where financial ROI is structurally hard to compute and you want to set up proxy metrics that give you a defensible read, book a conversation.

Sources

  • Erik Brynjolfsson J-curve year-one productivity research: adoption ramp dynamics that produce flat or negative year-one financial signal in successful deployments. Source.
  • McKinsey & Company (2025). The State of AI Global Survey. 88 per cent of organisations now use AI in at least one function but only 39 per cent report enterprise-level EBIT impact, the measurement gap that maturity frameworks address. Source.
  • McKinsey & Company (2024). From Promise to Impact, How Companies Can Measure and Realise the Full Value of AI. Five-layer measurement framework spanning technical performance, adoption, operational KPIs, strategic outcomes, financial impact. Source.
  • MIT CISR (Woerner, Sebastian, Weill and Kaganer, 2025). Grow Enterprise AI Maturity for Bottom-Line Impact. Stage 3 enterprises achieve growth 11.3 percentage points and profit 8.7 percentage points above industry average; Stage 1 firms underperform on both. Source.
  • Boston Consulting Group (2025). Are You Generating Value from AI, The Widening Gap. Five per cent of future-built firms achieve five times the revenue gains and three times the cost reductions of peers, with 60 per cent reporting almost no material value from AI investment. Source.
  • Standish Group, CHAOS Report (2020). Long-running benchmark of IT-project outcomes. 31 per cent succeed on contemporary definitions, 50 per cent are challenged, 19 per cent fail outright, the historical baseline for technology-investment measurement maturity. Source.
  • Brynjolfsson, E., Li, D. and Raymond, L. (2023). Generative AI at Work, NBER Working Paper 31161. Empirical productivity study showing 14 per cent average gain with 34 per cent for low-skilled workers, the basis for the J-curve and heterogeneity findings in AI productivity. Source.
  • Kaplan, R. and Norton, D. (1992). The Balanced Scorecard, Measures That Drive Performance, Harvard Business Review. Foundational article on multi-dimensional performance measurement and the leading-versus-lagging-indicator distinction. Source.

Frequently asked questions

What are proxy metrics for AI ROI?

Indirect measurements that signal whether an AI deployment is creating value, when financial ROI cannot be measured directly. The four most useful are adoption rate, sustained use after 90 days, voluntary expansion to new use cases, and complaint rate. If all four are positive, financial ROI almost always follows even when it cannot be directly measured.

Why is sustained use after 90 days the cleanest proxy?

Because users can be asked to fill out satisfaction surveys positively, but they cannot be asked to keep using a tool that does not help them. The behavioural signal of continued use is harder to game than any survey signal. In failed rollouts, adoption typically drops 50 to 60 percent between month two and month four; sustained use past month four is a strong indicator the tool is providing value.

When should I rely on proxy metrics rather than financial ROI?

When financial measurement is structurally hard. Multiple processes touched at once. Data infrastructure too thin to isolate the financial signal. Year-one numbers still sitting on the J-curve dip. Internal accounting that does not track gross margin per process. In these cases, proxies give a credible read while financial signal develops.

What's the discipline for separating signal from noise in proxy data?

Aggregate to weekly or monthly rather than daily, because daily numbers are noisy from project scheduling. Track cohorts (users trained in the same batch) rather than individuals, because individual variation drowns the underlying signal. Investigate when any of the four proxies turns negative.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation