The 95% AI pilot failure rate, what it means

The 95 per cent AI pilot failure rate, and what it actually means for your firm

TL;DR

The MIT NANDA finding that around 95 per cent of generative AI pilots fail to deliver measurable bottom-line impact is real, and it has been quoted in every boardroom since. Read carefully, the figure describes how pilots get scoped and measured, not whether AI works. Five failure modes drive the rate, and an owner-managed firm can head off the worst of them with proportionate discipline.

Key takeaways

- The MIT NANDA "GenAI Divide" report measured rapid revenue acceleration in around 300 enterprise deployments and found roughly 5 per cent achieving it, with the rest stalling. The figure is correct, the casual headline reading is not. - The same pattern shows up in McKinsey's 2025 State of AI, Stratify Insights' 2026 benchmark, and Sinch's customer-communications survey, so it is not a one-study artefact. - Five failure modes drive the rate: pilots scoped without a clear business problem, budget pointed at customer-facing tools before the back-office, off-the-shelf tools used as if they were workflow-ready, no investment in skills or change, and measuring the wrong thing at the wrong horizon. - For an owner-managed firm with one or two initiatives in flight, the practical question is not whether you will fall in the 95 per cent, it is which failure mode is most likely to catch you. - The discipline that heads off most of it is not exotic. Pick one problem, allocate to internal productivity first, build the skill before the rollout, write down the measure before the pilot starts.

An owner-manager I sat with this month had the same line quoted back at her twice in a fortnight, by two different advisers. Ninety-five per cent of AI pilots fail. The implication, in both conversations, was that she should slow down. She has two small AI initiatives running in the firm. One is cleaning up first drafts of client correspondence before a partner reviews them. The other is summarising long supplier emails into a short list of action items each morning. Both are visibly saving time across the team. She did not know whether the headline applied to her firm and she did not yet have the language to argue back without sounding defensive.

The headline is real. It comes from a 2025 report by the MIT NANDA initiative called “The GenAI Divide: State of AI in Business 2025”, built on 150 interviews, 350 employee surveys, and an analysis of around 300 enterprise deployments. The corroborating numbers from McKinsey, Stratify Insights, and Sinch all point in the same direction. So the figure is not a hot take. What the casual reading misses is what the study set out to measure, and what it deliberately did not.

That gap between the figure and the casual reading is where an owner-managed firm gets either confidently moving or unhelpfully stuck. The useful framing for an owner is to take the number at face value, look at what it actually measured, and work through the five failure modes that drive it, only some of which apply to a 30-person firm.

What does the 95 per cent number actually measure?

The MIT NANDA “GenAI Divide” report measured whether enterprise generative AI pilots delivered rapid revenue acceleration or measurable bottom-line impact inside the study window. Of around 300 deployments analysed, roughly 5 per cent did. The other 95 per cent fell short on that specific test. The authors attribute the gap to a “learning gap” between flexible generic tools and the reality of integrating them into enterprise workflows, data, and governance.

So the number is sound. It is also narrower than the headline suggests. It does not measure smaller internal-productivity gains, which is where many owner-managed firms see their first useful AI wins. It does not measure pilots that continued past the study window. And it is built on enterprise data, where pilot scope tends to be bigger and slower than anything a 30-person firm would run. The number is real, the comparable population for a UK SME is narrower than the headline implies.

Why does the same pattern show up in every other study?

The same shape appears across other large surveys. McKinsey’s 2025 State of AI reports nearly 90 per cent of organisations now use AI, while two-thirds remain in pilot phases and only a third have scaled. Stratify Insights’ 2026 benchmark puts deployment failure at 60 to 80 per cent. Sinch found 74 per cent of organisations have rolled back at least one AI agent on governance grounds, with mature firms rolling back more.

The consistent pattern is that pilots stall at the scaling step, where governance, data quality, ownership, and integration become binding. The studies measure slightly different things and converge on the same answer. Generic AI tools work fine for individuals. The work that turns a tool into a business outcome is the work around it, and that is exactly the work that gets skipped in a pilot scoped as “do something with AI”.

Where will you actually meet this risk?

Five failure modes drive the rate, and a firm tends to meet two or three of them rather than all five. The first is scoping a pilot without a clear business problem, where the technology is pushed looking for a use. The second is misallocating budget to sales and marketing tools when the higher-return work is back-office. The third is treating off-the-shelf tools as workflow-ready when they are not.

The fourth is neglecting skills and change, expecting tools to land in a team without preparation. The fifth is measuring the wrong thing, or at the wrong horizon, so that a pilot saving fifteen hours a week looks like a failure on a revenue measure that was never the right one. The Zillow Offers case is the extreme version of failure mode five. An advisory pricing model was used as a binding purchase tool, with no monitoring for changing market conditions, ending in an 881 million dollar loss in 2021 and around 2,000 redundancies. SME stakes are smaller, the structural error is identical, and the prevention is the same in a 30-person firm as in Zillow. A written success measure, agreed before the pilot starts, with a horizon that matches the kind of value the pilot is meant to produce.

When should you act on the headline, and when can you ignore it?

Act on it when a pilot has been running over three months without a written success measure, when budget sits on customer-facing tools before any back-office work, or when nobody on the team has had time to learn the tools properly. Those are the failure modes the research describes, and they burn time and credibility. The 95 per cent figure is a useful prompt to inspect those three.

Ignore the headline when an adviser uses it as a reason to delay any AI work at all. That reading misuses the number. The same studies show the firms doing well are running a small number of disciplined initiatives, not standing still. Holding off because of an enterprise-pilot failure rate is like a small firm refusing to use email in 1998 because of a paper on enterprise email integration projects.

Three threads connect to this piece if the failure-mode framing is useful. The first is reading any AI case study without getting lied to by survivorship and selection bias. The second is the AI ROI sanity-check material, which decomposes how the loud vendor ROI numbers are built. The third is the vendor case-study survivorship work, on reading a polished AI customer story for what it does and does not say.

Each deepens one layer of the picture this post sketches. The 95 per cent number is the headline. The case-study reading post is the verification discipline. The ROI sanity-check is the financial discipline. The vendor survivorship piece is the vendor-deck discipline. Together they give an owner-manager enough apparatus to disagree well with whichever adviser quotes the next headline at her.

If you want a thirty-minute conversation about which of the five failure modes is most likely to bite in your firm, Book a conversation.

Sources

- MIT NANDA initiative (2025). The GenAI Divide: State of AI in Business 2025. Source for the 95 per cent figure, the 150 interviews, 350 employee surveys, and 300 deployments analysed, and the "learning gap" framing. https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/ - McKinsey (2025). The state of AI in 2025: Global Survey. Source for the finding that nearly 90 per cent of organisations now use AI, that around two-thirds remain in pilot phases, and that over half report at least one negative consequence. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai - Stratify Insights (2026). AI project failure benchmark. Source for the 60 to 80 per cent failure-to-production estimate and the 70 per cent "structural rather than model" attribution. https://www.stratifyinsights.ai/ai-project-failure-rate - Sinch / TechRadar (2025). Customer-communications AI agent survey. Source for the finding that 74 per cent of organisations have rolled back at least one AI agent on governance grounds, and that more mature organisations had higher rollback rates. https://www.techradar.com/pro/the-most-advanced-organizations-arent-failing-less-theyre-seeing-failures-sooner-many-firms-are-already-having-to-roll-back-ai-customer-service-tools - Insurance Business UK (2025). Why AI stalls in insurance: the cultural and operational hurdles. Source for the sector pattern of abundant pilots and slow production, used as a worked example of the scaling gap. https://www.insurancebusinessmag.com/uk/news/technology/why-ai-stalls-in-insurance-the-cultural-and-operational-hurdles-548973.aspx - CIO.com (2025). Retail AI has a data problem, here is how to fix it. Source for the agentic-commerce pattern, used to illustrate that customer-facing AI struggles when underlying data is not unified and real-time. https://www.cio.com/article/4168980/retail-ai-has-a-data-problem-heres-how-to-fix-it.html - UK NCSC (2023). AI cyber security case study. Source for the "secure by design" framing and the obligation to be comfortable with worst-case behaviour of any AI system in your workflow. https://www.ncsc.gov.uk/collection/annual-review-2023/technology/case-study-cyber-security-ai - IBM Institute for Business Value. Thought leadership and research on enterprise AI value realisation. Source for the recurring "value gap" framing in enterprise AI research. https://www.ibm.com/thought-leadership/institute-business-value/en-us - Shackleford (2022 post-mortem). Zillow Offers and the 881 million dollar AI loss. Source for the canonical example of a model that "worked" while the governance around it failed, illustrating failure mode five (measuring the wrong thing at the wrong horizon). https://www.shackleford.coach/ai-leadership-insights/zillow-lost-881-million-the-ai-was-working-perfectly

Frequently asked questions

Is the 95 per cent figure reliable?

It is from the MIT NANDA initiative's 2025 report "The GenAI Divide: State of AI in Business 2025", based on 150 interviews, 350 employee surveys, and analysis of around 300 deployments. The methodology measured rapid revenue acceleration or measurable bottom-line impact inside the study window. The number is sound for what it set out to measure. It does not measure smaller internal-productivity AI use cases that save time without crossing the P&L line, which is where many owner-managed firms see early wins.

Does this mean we should hold off on AI?

No. It means scope tightly, point budget at internal productivity before customer-facing applications, and write down the success measure before the pilot starts. McKinsey's 2025 State of AI shows nearly 90 per cent of organisations now using AI, with the value gap concentrated at the scaling step rather than the experimentation step. The firms quietly doing well are not the ones avoiding AI, they are the ones running a small number of disciplined initiatives.

Which of the five failure modes catches owner-managed firms most often?

In my experience the first two: a pilot scoped to "do something with AI" rather than to solve a named business problem, and budget pointed at sales and marketing tooling because that is where the loud vendors are, when the higher-return work is usually back-office. Both are heads-off-able with a thirty-minute conversation before any money is spent.

Written by Dr Dave Heath, AI consultant and business strategist.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

The 95 per cent AI pilot failure rate, and what it actually means for your firm

Key takeaways

What does the 95 per cent number actually measure?

Why does the same pattern show up in every other study?

Where will you actually meet this risk?

When should you act on the headline, and when can you ignore it?

Sources

Frequently asked questions

Is the 95 per cent figure reliable?

Does this mean we should hold off on AI?

Which of the five failure modes catches owner-managed firms most often?

Ready to talk it through?

If any of this sounds familiar, let's talk.

The 95 per cent AI pilot failure rate, and what it actually means for your firm

Key takeaways

What does the 95 per cent number actually measure?

Why does the same pattern show up in every other study?

Where will you actually meet this risk?

When should you act on the headline, and when can you ignore it?

Related concepts and what to read next

Sources

Frequently asked questions

Is the 95 per cent figure reliable?

Does this mean we should hold off on AI?

Which of the five failure modes catches owner-managed firms most often?

Ready to talk it through?

Related reading

Practical AI ideas for small business operations

Healthcare AI use cases that reduce admin and improve flow

What digital marketing teams are actually doing with AI

If any of this sounds familiar, let's talk.