How to read AI vendor case studies sceptically

A managing partner reviewing a printed case-study deck across a meeting room table with another person mid-conversation
TL;DR

A 3x ROI year-one with 90 percent adoption case study represents a customer drawn from the top tail of the distribution, not a typical outcome. Five biases stack: survivorship, opt-in reporting, vendor selection, measurement, and timeframe. The right way to read a case study is as an existence proof, not as a baseline expectation.

Key takeaways

- Survivorship bias: case studies come from customers who succeeded with the tool, not from those who abandoned it. The set is filtered. - Opt-in reporting bias: customers who agree to participate are more satisfied than the average. The set is doubly filtered. - Vendor selection bias: among willing customers, vendors choose the most flattering numbers. The set is triply filtered. - True distribution for comparable customers is roughly 1.2x to 2.5x ROI with 50 to 70 percent adoption, not 3x with 90 percent. - The same scepticism applies to statistics that flatter your position. The "73 percent of UK SMEs cannot demonstrate AI ROI" figure is itself low-rigour.

Picture a managing partner I’ll call Chris. Reviewing a vendor’s case-study deck across the meeting room table with two of his partners. Three success stories. Each shows 80 percent plus adoption, 2x to 3x ROI in year one, glowing partner quotes from peer firms in the same sector. Chris has talked to those peers privately. The numbers his peers describe are not the numbers in the deck. He wants to challenge the deck without sounding hostile to the engagement, and he is not yet sure how.

Vendor case studies are not fabricated. They are also not deceptive in any single sentence. They are systematically biased upward by the structure of how case studies get selected, reported, and published. Five biases stack. None of them is a defect of the vendor; together they make the published numbers a poor guide to typical customer experience.

What is the survivorship bias problem?

Survivorship bias is the dominant filter. Vendors publish case studies from customers who succeeded with the tool, not from customers who abandoned it. The case-study set is drawn entirely from the top of the performance distribution. Customers who tried the tool and abandoned it are absent. Customers who used it modestly and saw modest outcomes are absent. Only the visible successes make it into the deck.

Survivorship bias shows up across many domains, not just in AI vendor decks. The same pattern makes restaurant guidebooks misleading (only successful restaurants survive long enough to be reviewed) and makes investment fund track records misleading (closed underperforming funds vanish from the data). It is the structural result of how publishing decisions get made, and it produces a systematic upward bias in any aggregate of published cases.

For an SME reading a vendor deck, the implication is concrete. The customers in the deck are the visible successes. The customers who would have been the median experience are not visible.

What other biases compound the survivorship problem?

Opt-in reporting is the second filter. Among customers the vendor invites to participate in a case study, only some agree. The ones who agree are more likely to be satisfied with the tool than the average customer. Saying yes to a case study is a positive signal. The case-study population is therefore self-selected to be above-average satisfied, on top of being already top-of-distribution.

Vendor selection is the third filter. Among the willing satisfied customers, vendors choose the ones whose numbers look most flattering. A customer with 1.6x ROI may be perfectly happy and willing to participate, but the vendor will pick the customer with 2.4x ROI instead. The case-study set is therefore the top of the top of the willing.

Measurement bias is the fourth. Vendors report what is easy to measure (productivity gains, hours saved) rather than what is most meaningful to the firm (margin impact, quality cost, reallocation outcome). The case-study figures often miss the parts of ROI that matter most.

Timeframe bias is the fifth. Vendors report figures over the window that flatters the tool. If first-year ROI is disappointing, three-year ROI is reported. If three-year ROI is mixed, a “leading indicator” that is currently strong is reported. The chosen timeframe is rarely the one that would best serve the reader.

The five biases stack multiplicatively, not additively. A case study that survives all five is sitting in the top single-digit percentage of the customer distribution.

What does the actual customer distribution look like?

The true distribution for comparable customers is roughly 1.2x to 2.5x ROI with 50 to 70 percent adoption, with a long left tail of underperformers and abandoners. Case studies showing 3x ROI with 90 percent adoption represent the top decile of customer outcomes, not the median.

This is not a small gap. A firm that benchmarks its expectations against case-study figures is planning for the top decile and will be disappointed by anything closer to typical. That disappointment is the source of much of the AI ROI scepticism currently circulating in SME circles. The marketed numbers are top-decile, the lived experience is median, and the gap between the two is what the disappointment lives in.

The MIT NANDA failure analysis describes the matching pattern from the failure side: roughly 60 to 70 percent of technology projects fail to deliver expected returns, with “failure” defined as delivering less than 70 percent of projection. The distribution is heavily left-tailed.

What questions should you ask of any case study?

Four questions cut through. What is the sample size, and what is the customer-set selection process for these case studies? Across all your customers, not just your case studies, what is the median ROI at twelve months and at twenty-four months? What is the lower quartile? And what proportion of customers abandoned the tool within twelve months?

A vendor who cannot answer these questions has told you something useful. A vendor who can answer them but whose answers diverge sharply from the case-study figures has also told you something useful. The case studies are not lying; they are showing you the upper tail of the distribution. The aggregate numbers, if available, show you the median.

The reframe for the firm reading the deck is concrete. A case study is useful as an existence proof: “this outcome is achievable in this context with these conditions.” It is not useful as an expectation: “this is what we should plan for.” A firm that benchmarks against case studies plans for the top decile and is disappointed by the median.

What about statistics that flatter your sceptical position?

Symmetric scepticism matters. The “73 percent of UK SMEs cannot demonstrate financial AI ROI within twelve months” figure that circulates in industry commentary deserves the same scrutiny as a vendor’s claim. It appears mostly in practitioner writing rather than in academic research with disclosed methodology. The likely source is aggregated survey data from consulting firms or analyst organisations, though the exact methodology is rarely disclosed.

Treated honestly, the figure indicates a tendency, not a precise finding. The likely range is 50 to 75 percent, with the higher figures coming from surveys that count “cannot measure” as equivalent to “did not deliver.”

The principle applies to any statistic that suits the reader’s position. The discipline of asking “where does this number come from?” is the same whether the number flatters the vendor or flatters the sceptic. Numbers earn their place by methodology, not by which side of the argument they support.

If you are reading a vendor deck and trying to work out what to believe, the questions in this post are the place to start. If you’d like to talk through a specific deck or proposal you are evaluating, book a conversation.

Sources

  • MIT NANDA failure analysis: roughly 60-70% of technology projects fail to deliver expected returns ("failure" defined as delivering less than 70% of projection); distribution heavily left-tailed. Source.
  • Survivorship-bias literature: pattern across vendor case studies, restaurant guidebooks, and investment fund track records, where published successes systematically exclude abandoners and underperformers. Source.
  • "73% of UK SMEs cannot demonstrate financial AI ROI within twelve months" practitioner figure: methodology rarely disclosed, likely range 50-75%, with higher figures conflating "cannot measure" with "did not deliver.".
  • McKinsey & Company (2025). The State of AI Global Survey. 88 per cent of organisations now use AI in at least one function but only 39 per cent report enterprise-level EBIT impact, the measurement gap that maturity frameworks address. Source.
  • McKinsey & Company (2024). From Promise to Impact, How Companies Can Measure and Realise the Full Value of AI. Five-layer measurement framework spanning technical performance, adoption, operational KPIs, strategic outcomes, financial impact. Source.
  • MIT CISR (Woerner, Sebastian, Weill and Kaganer, 2025). Grow Enterprise AI Maturity for Bottom-Line Impact. Stage 3 enterprises achieve growth 11.3 percentage points and profit 8.7 percentage points above industry average; Stage 1 firms underperform on both. Source.
  • Boston Consulting Group (2025). Are You Generating Value from AI, The Widening Gap. Five per cent of future-built firms achieve five times the revenue gains and three times the cost reductions of peers, with 60 per cent reporting almost no material value from AI investment. Source.
  • Standish Group, CHAOS Report (2020). Long-running benchmark of IT-project outcomes. 31 per cent succeed on contemporary definitions, 50 per cent are challenged, 19 per cent fail outright, the historical baseline for technology-investment measurement maturity. Source.

Frequently asked questions

Why are vendor AI case studies systematically biased upward?

Five biases stack. Survivorship (only successful customers get published). Opt-in reporting (willing customers are more satisfied than average). Vendor selection (vendors pick the most flattering cases). Measurement bias (vendors report what is easy, not what is meaningful). Timeframe bias (figures are reported over the window that flatters).

What does the actual AI customer outcome distribution look like?

Roughly 1.2x to 2.5x ROI with 50 to 70 percent adoption, with a long left tail of underperformers and abandoners. Case studies showing 3x ROI with 90 percent adoption represent the top decile, not the median. The median customer experience is materially less impressive than any case study a vendor publishes.

What questions should I ask of any vendor case study?

Across all your customers, not just your case studies, what is the median outcome at twelve months and twenty-four months? What is the lower quartile? What proportion of customers abandoned the tool within twelve months? What is the customer-set selection process for the case studies? Vendors who cannot or will not answer have told you something useful.

Should I distrust statistics that suit my sceptical position too?

Yes. The discipline is symmetric. The '73 percent of UK SMEs cannot demonstrate financial AI ROI within twelve months' figure that circulates in industry commentary appears mostly in practitioner writing rather than in academic research with disclosed methodology. The likely range is 50 to 75 percent. Any stat that flatters your position deserves the same scrutiny you apply to vendor figures.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation