The renewal sat on Edward’s desk for three weeks before he opened it. He runs a 42-person services firm. Twelve months ago he signed off a Copilot rollout because the business case said the average knowledge worker would save several hours a week. A year in, nobody is working fewer hours. Output is higher, the team feels busier, and he can’t tell the board whether the rollout has paid for itself. The licence renewal is real money. He’s stalling because he hasn’t got a clean answer.
Edward has a measurement problem, and it’s arriving on a lot of desks at once.
Why do the survey numbers and the behavioural data disagree?
Both numbers are real, and they don’t reconcile through one being wrong. The Small Business and Entrepreneurship Council survey found a median of thirteen hours a week saved per employee. ActivTrak analysed 443 million hours of actual work across 1,111 organisations and 163,638 employees over three years. After AI adoption, every measured category of work went up, by between twenty-seven and three hundred and forty-six per cent.
The reconciliation is that workers are using AI to do more, not to do the same in less time. A teacher who used to spend an hour producing one differentiated worksheet now produces a worksheet in twenty minutes. She reports forty minutes saved. But she’s now producing five worksheets instead of one, plus reviewing them for accuracy. The net effect on her week is plus eighty minutes, not minus forty.
That distinction matters because it changes the business case entirely. AI as a time-savings investment is one thing, justified one way, measured one way, explained to the team one way. AI as a capacity-expansion investment is a different thing, justified differently, measured differently, explained differently. Many founders bought the first and got the second, and they’re now trying to evaluate the second using the language of the first.
Where does the cost actually land?
The licence is the smallest part. Glean’s analysis of AI total cost of ownership found that more than half of organisations miss their AI cost forecasts by eleven to twenty-five per cent, and nearly one in four miss by more than fifty per cent. The reason is consistent: the budget at sign-off is the licence, and the licence is roughly thirty per cent of the real cost in year one.
The breakdown that holds up across SME deployments is software at thirty per cent, integration at forty per cent, training and change management at twenty per cent, ongoing operations at ten per cent. On legacy systems, data preparation alone can consume up to eighty per cent of project resources. gigcmo’s SME-specific research found that up to seventy per cent of SME AI initiatives are abandoned before reaching production, with costs routinely overrunning budget by twenty to seventy per cent.
For Edward, the implication is straightforward. The £50,000 licence renewal is a marker for an investment that probably cost £150,000 to £200,000 once integration, training, and oversight are honestly counted. The renewal decision needs to be made against that figure, not the headline figure on the invoice.
What’s the cost that doesn’t appear on any invoice?
The cost nobody priced in is cognitive overhead. BCG’s study of 1,488 US workers found that high AI-oversight workloads produced thirty-nine per cent more major errors, twelve per cent more mental fatigue, and significantly higher information overload. Workers spent more time monitoring outputs than they used to spend producing them. The expectation that AI would reduce cognitive burden inverts when staff become quality-assurance layers between the model and the deliverable.
Microsoft’s own Work Trend Index data shows focus time falling to a three-year low even as AI adoption climbed. The average focused session is now thirteen minutes and seven seconds, down nine per cent year on year, while collaboration surged thirty-four per cent and multitasking rose twelve per cent. The Microsoft researchers were honest about the ambiguity: AI may be absorbing the cognitive load that focus time used to carry, or adding faster, more frequent attention shifts. The distinction determines whether the productivity gain is real.
This cost shows up as quality drift, not as a line item. Errors that take an extra hour to spot. Drafts that need a second pass. Decisions made on AI-summarised input that turn out to have missed the nuance. None of it lands in the AI budget. All of it lands somewhere.
What should you actually measure?
Stop counting hours. Three questions matter, and the answers are uncomfortable to gather but produce a real ROI conversation. First, what additional output is the team producing that wasn’t being produced before? Be specific. New reports, more proposals, broader audit scope, expanded service offering. Second, what quality cost does that additional output carry, including review time and error rate? Third, does that additional output have a buyer, internal or external?
If all three answers are clear and the maths works, the rollout is paying off, regardless of whether anyone’s hours went down. If the answer to the first question is “we’re producing the same thing faster”, the conversation is about cost takeout, and the metric is cost per unit of output. If the answer is “we’re now producing X that we couldn’t before”, the conversation is about growth, and the metric is revenue or value attributable to the new output.
Fortune’s reporting on the AI productivity paradox names this pattern in real companies. AES converted a fourteen-day audit and data-entry process into a one-hour task. Rather than send staff home early, they expanded audit scope and frequency. Google reports AI writes fifty per cent of its code, producing a velocity gain over ten per cent across tens of thousands of engineers. The result wasn’t smaller engineering teams; it was faster shipping of more features. The productivity gain was real. The time gain to employees was not.
What does Edward do with the renewal on Monday?
The honest move is to pick one process where Copilot is in real use, sit with the team for an hour, and ask what they’re now producing that they weren’t producing twelve months ago. Then price what that extra output is worth to the firm. The answer is rarely zero. It’s often more than the renewal cost. It’s almost never expressible as hours saved.
If the team can name three pieces of additional output and roughly cost them, Edward can defend the renewal at the next board meeting in language the board will accept. If the team can’t name them, the rollout has been expanding effort without expanding value, and the renewal is a different conversation. Pause it, renegotiate, or kill it and redeploy the budget where the output question has a cleaner answer.
The frame is doing the work here. The hours-saved figure was always going to disappoint, because it was measuring something that wasn’t happening. Capacity expansion was happening. It still is. Whether it’s worth the renewal depends on whether the firm can name the buyer for the expanded capacity. That’s the conversation worth having before the contract gets signed for another year.
If the renewal sitting on your desk feels like Edward’s, and you’re trying to work out what to actually measure before the next board meeting, book a conversation.



