An owner-manager I sat with this month had the same line quoted back at her twice in a fortnight, by two different advisers. Ninety-five per cent of AI pilots fail. The implication, in both conversations, was that she should slow down. She has two small AI initiatives running in the firm. One is cleaning up first drafts of client correspondence before a partner reviews them. The other is summarising long supplier emails into a short list of action items each morning. Both are visibly saving time across the team. She did not know whether the headline applied to her firm and she did not yet have the language to argue back without sounding defensive.
The headline is real. It comes from a 2025 report by the MIT NANDA initiative called “The GenAI Divide: State of AI in Business 2025”, built on 150 interviews, 350 employee surveys, and an analysis of around 300 enterprise deployments. The corroborating numbers from McKinsey, Stratify Insights, and Sinch all point in the same direction. So the figure is not a hot take. What the casual reading misses is what the study set out to measure, and what it deliberately did not.
That gap between the figure and the casual reading is where an owner-managed firm gets either confidently moving or unhelpfully stuck. The useful framing for an owner is to take the number at face value, look at what it actually measured, and work through the five failure modes that drive it, only some of which apply to a 30-person firm.
What does the 95 per cent number actually measure?
The MIT NANDA “GenAI Divide” report measured whether enterprise generative AI pilots delivered rapid revenue acceleration or measurable bottom-line impact inside the study window. Of around 300 deployments analysed, roughly 5 per cent did. The other 95 per cent fell short on that specific test. The authors attribute the gap to a “learning gap” between flexible generic tools and the reality of integrating them into enterprise workflows, data, and governance.
So the number is sound. It is also narrower than the headline suggests. It does not measure smaller internal-productivity gains, which is where many owner-managed firms see their first useful AI wins. It does not measure pilots that quietly continued past the study window. And it is built on enterprise data, where pilot scope tends to be bigger and slower than anything a 30-person firm would run. The number is real, the comparable population for a UK SME is narrower than the headline implies.
Why does the same pattern show up in every other study?
The same shape appears across other large surveys. McKinsey’s 2025 State of AI reports nearly 90 per cent of organisations now use AI, while two-thirds remain in pilot phases and only a third have scaled. Stratify Insights’ 2026 benchmark puts deployment failure at 60 to 80 per cent. Sinch found 74 per cent of organisations have rolled back at least one AI agent on governance grounds, with mature firms rolling back more.
The consistent pattern is that pilots stall at the scaling step, where governance, data quality, ownership, and integration become binding. The studies measure slightly different things and converge on the same answer. Generic AI tools work fine for individuals. The work that turns a tool into a business outcome is the work around it, and that is exactly the work that gets skipped in a pilot scoped as “do something with AI”.
Where will you actually meet this risk?
Five failure modes drive the rate, and a firm tends to meet two or three of them rather than all five. The first is scoping a pilot without a clear business problem, where the technology is pushed looking for a use. The second is misallocating budget to sales and marketing tools when the higher-return work is back-office. The third is treating off-the-shelf tools as workflow-ready when they are not.
The fourth is neglecting skills and change, expecting tools to land in a team without preparation. The fifth is measuring the wrong thing, or at the wrong horizon, so that a pilot quietly saving fifteen hours a week looks like a failure on a revenue measure that was never the right one. The Zillow Offers case is the extreme version of failure mode five. An advisory pricing model was used as a binding purchase tool, with no monitoring for changing market conditions, ending in an 881 million dollar loss in 2021 and around 2,000 redundancies. SME stakes are smaller, the structural error is identical, and the prevention is the same in a 30-person firm as in Zillow. A written success measure, agreed before the pilot starts, with a horizon that matches the kind of value the pilot is meant to produce.
When should you act on the headline, and when can you ignore it?
Act on it when a pilot has been running over three months without a written success measure, when budget sits on customer-facing tools before any back-office work, or when nobody on the team has had time to learn the tools properly. Those are the failure modes the research describes, and they quietly burn time and credibility. The 95 per cent figure is a useful prompt to inspect those three.
Ignore the headline when an adviser uses it as a reason to delay any AI work at all. That reading misuses the number. The same studies show the firms doing well are running a small number of disciplined initiatives, not standing still. Holding off because of an enterprise-pilot failure rate is like a small firm refusing to use email in 1998 because of a paper on enterprise email integration projects.
Related concepts and what to read next
Three threads connect to this piece if the failure-mode framing is useful. The first is reading any AI case study without getting lied to by survivorship and selection bias. The second is the AI ROI sanity-check material, which decomposes how the loud vendor ROI numbers are built. The third is the vendor case-study survivorship work, on reading a polished AI customer story for what it does and does not say.
Each deepens one layer of the picture this post sketches. The 95 per cent number is the headline. The case-study reading post is the verification discipline. The ROI sanity-check is the financial discipline. The vendor survivorship piece is the vendor-deck discipline. Together they give an owner-manager enough apparatus to disagree well with whichever adviser quotes the next headline at her.
If you want a thirty-minute conversation about which of the five failure modes is most likely to bite in your firm, Book a conversation.



