Why 95% of AI pilots stall, and what the 5% do differently

A person reviewing a document at a meeting room table, pen in hand, looking focused
TL;DR

Roughly 95% of generative AI pilots show no measurable profit-and-loss impact, according to MIT NANDA research. The failure is almost always a workflow integration problem, not a technology one. The firms that consistently show results narrow the problem scope, start in back-office functions rather than sales and marketing, name a clear owner for the outcome, and measure progress before the financial results arrive. Understanding these patterns is what separates a delegate with a defensible position from one who can only wait.

Key takeaways

- The MIT NANDA GenAI Divide research found roughly 95% of AI pilots show no measurable P&L impact; this measures commercial translation speed, not effort or the quality of the technology itself. - Pilots almost always stall on workflow integration and named ownership, not on the technology layer. - Back-office AI automation delivers consistently higher ROI than sales and marketing initiatives, which receive the most investment but show the lowest returns. - The firms that get into the 5% narrow their scope, name a real person as accountable for the business outcome, and track leading indicators before the financial results appear. - Apply this test before any initiative starts: would this still matter if it did not use AI? A yes means the business case is genuine.

You’re three weeks into the mandate when someone slides the MIT statistic into a board pack. Ninety-five per cent of generative AI pilots show no measurable impact on the profit and loss. Nobody comments on it. The room moves on. You sit with that number for the rest of the meeting, doing the calculations on whether you’re about to be part of the majority.

That calculation is worth doing properly. The figure is considerably more useful once you understand what it actually counts.

What does the 95% figure actually measure?

The MIT NANDA GenAI Divide research tracked generative AI pilots across a range of organisations and found that roughly 95% showed no measurable profit-and-loss impact within their measurement window. That figure includes every pilot that produced genuine learning, built capability, or changed a workflow but had not yet shown up as cost savings or revenue. It measures commercial translation speed, not effort or the quality of the approach.

What the statistic does not tell you is whether the organisation running the pilot is on a trajectory toward results. A meaningful share of the 95% are early-stage learning investments that will show returns in year two or three. Others are genuine failures with identifiable causes. Others still are initiatives that worked but were abandoned before their value was properly measured. The MIT figure bundles all three into the same column, which is why it reads as more alarming than the underlying situation may warrant. Your job as the delegate is to know which column you are in, not to avoid the figure.

The Korn Ferry research on what they call the AI readiness paradox adds a useful frame. Organisations frequently assign AI leadership to trusted operators who lack AI-specific competencies, creating high expectations in a role with low preparation. That expectation gap accounts for a significant share of the 95%. The conditions for translating technology into results are often absent from the start, and that is a design gap, not a technology gap.

Why does this failure rate matter if you’re holding the mandate?

When an AI pilot stalls, the organisation rarely concludes that the technology was the problem. The person leading the initiative becomes the natural explanation. Spencer Stuart’s research found that delegates who manage this well set the frame early, naming what the pilot will prove and why commercial results typically take twelve to twenty-four months to appear. That frame, established before results are due, separates a confident position from a defensive one.

The professional risk is concrete. Korn Ferry documents how stalled initiatives can turn employee scepticism into leadership scepticism, with the AI mandate-holder as the visible target. A substantial proportion of executives in this position report concerns about career consequences if AI adoption fails on their watch. That context is useful for understanding why the credibility conversation matters from day one, not as a source of anxiety but as a reason to get clarity established early, before the gap appears.

Where do AI pilots actually break down in practice?

The breakdown almost never happens at the technology layer. The model works; the integration does not. Pilots stall when grafted onto poorly documented workflows, when no one owns the outcome once the demo ends, and when the scope was defined around a tool rather than a specific business problem. BCG research confirms rising AI usage consistently fails to translate into measurable business impact, and the gap is process and ownership rather than capability.

OECD research on AI adoption in owner-managed businesses confirms the pattern from a different angle. The constraint is almost always capability and process readiness rather than access to tools. Firms that have not documented their workflows, addressed data quality problems, or designated a named owner for the outcome are solving a workflow problem and calling it an AI project. The practical consequence is a pilot that performs in a controlled environment and breaks apart when it meets the real operation.

The ownership question deserves a direct answer before any initiative is approved. Many failed pilots have no clear account of who is accountable when the project finishes and the vendor has left. The most reliable predictor of whether a pilot survives contact with the real organisation is whether someone has their name on the outcome before the work begins.

What do the 5% who show measurable results do differently?

The firms that consistently show P&L results from AI share a small set of habits. They choose domain-specific, well-defined problems over broad capability roll-outs. They start in back-office functions rather than sales and marketing, where returns are consistently higher. They name a real person as accountable for the business outcome, not just the initiative. And they use a measurement frame that shows progress before the financial results appear.

The counterintuitive MIT finding is worth registering. Back-office automation delivers the highest returns from AI, while sales and marketing initiatives receive the most investment but show the lowest ROI. Many delegates start where the AI conversation is loudest, which turns out to be the hardest place to demonstrate early value. Starting in accounts payable, document processing, or customer service handling is less visible but far more likely to produce a result you can report to the board within the first year. That sequence, back-office first and then customer-facing tools once the operating model is proven, is what shows up repeatedly in the firms that get results.

The dual measurement frame is also practical here. Tracking trending ROI, including leading indicators such as time saved, error rates, and processing cycle times, alongside realised ROI gives you something concrete at every board update. Meaningful financial returns from AI commonly take twelve to twenty-four months to appear. A measurement approach that acknowledges this openly is more useful than one that promises financial impact on a timeline the evidence does not support.

What should you ask before any initiative goes live?

Addepar’s test for any AI investment is straightforward. Would this initiative still matter if it did not use AI? If the answer is no, the initiative is solving a technology problem rather than a business one. If yes, the underlying business case is genuine and the technology is a means of addressing it. The question takes thirty seconds and filters out much of the enthusiasm that absorbs budget without producing results.

The same logic applies to scope. Initiatives that try to address an enterprise-wide problem from the start tend to collapse on complexity before they reach the technology. The firms that get into the 5% run their first initiative in a single function, with a defined start and end state, and a named person who can describe the before-and-after in business terms rather than AI terms. Narrow scope, genuine business problem, named owner, measured outcome. Those four elements appear consistently in the pilots that produce results.

If the mandate you’ve been given does not yet have all four in place, that is the first conversation to have rather than tool selection. Define the problem, document the workflow, name the owner, set the measurement frame. The technology can wait a few weeks; the clarity cannot. Book a conversation if you want to work through that sequence in your specific situation.

Sources

- BCG (2025). "The AI Adoption Puzzle: Why Usage Is Up But Impact Is Not." Confirms the consistent gap between rising AI usage and measurable business impact across organisations. https://www.bcg.com/publications/2025/ai-adoption-puzzle-why-usage-up-impact-not - McKinsey and Company (2025). "Superagency in the Workplace." Research on the gap between AI tool adoption and measurable outcomes in the workplace. https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work - Korn Ferry (2025). "6 Signs Leaders Lack AI Readiness and How to Fix It." Documents the AI readiness paradox: organisations assign AI leadership to strong operators who lack the specific competencies the role requires. https://www.kornferry.com/insights/featured-topics/gen-ai-in-the-workplace-articles/6-signs-leaders-lack-ai-readiness-and-how-to-fix-it - Spencer Stuart (2025). "Don't Delegate AI: A Power User Playbook for CEOs." Research on how effective AI mandate-holders manage early-stage expectations and credibility risk. https://www.spencerstuart.com/research-and-insight/dont-delegate-ai-a-power-user-playbook-for-ceos - OECD (2025). "AI Adoption by Small and Medium-Sized Enterprises." Finds that capability and process readiness, not tool access, is the primary constraint on AI adoption in owner-managed businesses. https://www.oecd.org/en/publications/2025/12/ai-adoption-by-small-and-medium-sized-enterprises_9c48eae6.html - SRA Analytics (2025). "Why 95% of AI Projects Fail." Covers the MIT NANDA GenAI Divide finding that roughly 95% of generative AI pilots show no measurable P&L impact, with analysis of the primary failure categories. https://sranalytics.io/blog/why-95-of-ai-projects-fail/ - Addepar (2025). "Questions Executives Should Ask Before Adopting AI." Source of the business-first test for AI investment: would this initiative still matter if it did not use AI? https://addepar.com/blog/questions-executives-should-ask-before-adopting-ai - Propeller (2025). "Measuring AI ROI: How to Build an AI Strategy That Captures Business Value." Covers the dual-ROI measurement framework and the typical 12-24 month timeline for meaningful AI returns. https://propeller.com/blog/measuring-ai-roi-how-to-build-an-ai-strategy-that-captures-business-value

Frequently asked questions

Why do 95% of AI pilots fail to show measurable results?

According to MIT NANDA research, the failure is overwhelmingly a workflow integration and ownership problem, not a technology one. Pilots stall when they are grafted onto poorly documented processes, when no one with real accountability owns the outcome once the demo ends, and when the scope was defined around a tool rather than a specific business problem. The technology works; the conditions for it to show value do not exist.

What do the businesses that successfully show AI results do differently?

They narrow the scope to a domain-specific, well-understood problem rather than a broad capability roll-out. They start in back-office functions where AI ROI is consistently higher, name a person accountable for the business outcome, and use a dual measurement frame that tracks leading indicators alongside financial results. This gives them something to report at every board update rather than nothing until the P&L eventually moves.

How long does it take for AI investments to show measurable ROI?

Meaningful financial returns from AI commonly take twelve to twenty-four months to appear, with some implementations taking longer. The delegates who manage board expectations well are those who set this frame before results are due rather than explaining the gap after it appears. Tracking trending ROI alongside realised ROI gives a more honest picture of progress during the early period.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation