You’re three weeks into the mandate when someone slides the MIT statistic into a board pack. Ninety-five per cent of generative AI pilots show no measurable impact on the profit and loss. Nobody comments on it. The room moves on. You sit with that number for the rest of the meeting, doing the calculations on whether you’re about to be part of the majority.
That calculation is worth doing properly. The figure is considerably more useful once you understand what it actually counts.
What does the 95% figure actually measure?
The MIT NANDA GenAI Divide research tracked generative AI pilots across a range of organisations and found that roughly 95% showed no measurable profit-and-loss impact within their measurement window. That figure includes every pilot that produced genuine learning, built capability, or changed a workflow but had not yet shown up as cost savings or revenue. It measures commercial translation speed, not effort or the quality of the approach.
What the statistic does not tell you is whether the organisation running the pilot is on a trajectory toward results. A meaningful share of the 95% are early-stage learning investments that will show returns in year two or three. Others are genuine failures with identifiable causes. Others still are initiatives that worked but were abandoned before their value was properly measured. The MIT figure bundles all three into the same column, which is why it reads as more alarming than the underlying situation may warrant. Your job as the delegate is to know which column you are in, not to avoid the figure.
The Korn Ferry research on what they call the AI readiness paradox adds a useful frame. Organisations frequently assign AI leadership to trusted operators who lack AI-specific competencies, creating high expectations in a role with low preparation. That expectation gap accounts for a significant share of the 95%. The conditions for translating technology into results are often absent from the start, and that is a design gap, not a technology gap.
Why does this failure rate matter if you’re holding the mandate?
When an AI pilot stalls, the organisation rarely concludes that the technology was the problem. The person leading the initiative becomes the natural explanation. Spencer Stuart’s research found that delegates who manage this well set the frame early, naming what the pilot will prove and why commercial results typically take twelve to twenty-four months to appear. That frame, established before results are due, separates a confident position from a defensive one.
The professional risk is concrete. Korn Ferry documents how stalled initiatives can turn employee scepticism into leadership scepticism, with the AI mandate-holder as the visible target. A substantial proportion of executives in this position report concerns about career consequences if AI adoption fails on their watch. That context is useful for understanding why the credibility conversation matters from day one, not as a source of anxiety but as a reason to get clarity established early, before the gap appears.
Where do AI pilots actually break down in practice?
The breakdown almost never happens at the technology layer. The model works; the integration does not. Pilots stall when grafted onto poorly documented workflows, when no one owns the outcome once the demo ends, and when the scope was defined around a tool rather than a specific business problem. BCG research confirms rising AI usage consistently fails to translate into measurable business impact, and the gap is process and ownership rather than capability.
OECD research on AI adoption in owner-managed businesses confirms the pattern from a different angle. The constraint is almost always capability and process readiness rather than access to tools. Firms that have not documented their workflows, addressed data quality problems, or designated a named owner for the outcome are solving a workflow problem and calling it an AI project. The practical consequence is a pilot that performs in a controlled environment and breaks apart when it meets the real operation.
The ownership question deserves a direct answer before any initiative is approved. Many failed pilots have no clear account of who is accountable when the project finishes and the vendor has left. The most reliable predictor of whether a pilot survives contact with the real organisation is whether someone has their name on the outcome before the work begins.
What do the 5% who show measurable results do differently?
The firms that consistently show P&L results from AI share a small set of habits. They choose domain-specific, well-defined problems over broad capability roll-outs. They start in back-office functions rather than sales and marketing, where returns are consistently higher. They name a real person as accountable for the business outcome, not just the initiative. And they use a measurement frame that shows progress before the financial results appear.
The counterintuitive MIT finding is worth registering. Back-office automation delivers the highest returns from AI, while sales and marketing initiatives receive the most investment but show the lowest ROI. Many delegates start where the AI conversation is loudest, which turns out to be the hardest place to demonstrate early value. Starting in accounts payable, document processing, or customer service handling is less visible but far more likely to produce a result you can report to the board within the first year. That sequence, back-office first and then customer-facing tools once the operating model is proven, is what shows up repeatedly in the firms that get results.
The dual measurement frame is also practical here. Tracking trending ROI, including leading indicators such as time saved, error rates, and processing cycle times, alongside realised ROI gives you something concrete at every board update. Meaningful financial returns from AI commonly take twelve to twenty-four months to appear. A measurement approach that acknowledges this openly is more useful than one that promises financial impact on a timeline the evidence does not support.
What should you ask before any initiative goes live?
Addepar’s test for any AI investment is straightforward. Would this initiative still matter if it did not use AI? If the answer is no, the initiative is solving a technology problem rather than a business one. If yes, the underlying business case is genuine and the technology is a means of addressing it. The question takes thirty seconds and filters out much of the enthusiasm that absorbs budget without producing results.
The same logic applies to scope. Initiatives that try to address an enterprise-wide problem from the start tend to collapse on complexity before they reach the technology. The firms that get into the 5% run their first initiative in a single function, with a defined start and end state, and a named person who can describe the before-and-after in business terms rather than AI terms. Narrow scope, genuine business problem, named owner, measured outcome. Those four elements appear consistently in the pilots that produce results.
If the mandate you’ve been given does not yet have all four in place, that is the first conversation to have rather than tool selection. Define the problem, document the workflow, name the owner, set the measurement frame. The technology can wait a few weeks; the clarity cannot. Book a conversation if you want to work through that sequence in your specific situation.



