Picture a managing partner I’ll call Chris. Reviewing a vendor’s case-study deck across the meeting room table with two of his partners. Three success stories. Each shows 80 percent plus adoption, 2x to 3x ROI in year one, glowing partner quotes from peer firms in the same sector. Chris has talked to those peers privately. The numbers his peers describe are not the numbers in the deck. He wants to challenge the deck without sounding hostile to the engagement, and he is not yet sure how.
Vendor case studies are not fabricated. They are also not deceptive in any single sentence. They are systematically biased upward by the structure of how case studies get selected, reported, and published. Five biases stack. None of them is a defect of the vendor; together they make the published numbers a poor guide to typical customer experience.
What is the survivorship bias problem?
Survivorship bias is the dominant filter. Vendors publish case studies from customers who succeeded with the tool, not from customers who abandoned it. The case-study set is drawn entirely from the top of the performance distribution. Customers who tried the tool and abandoned it are absent. Customers who used it modestly and saw modest outcomes are absent. Only the visible successes make it into the deck.
Survivorship bias shows up across many domains, not just in AI vendor decks. The same pattern makes restaurant guidebooks misleading (only successful restaurants survive long enough to be reviewed) and makes investment fund track records misleading (closed underperforming funds vanish from the data). It is the structural result of how publishing decisions get made, and it produces a systematic upward bias in any aggregate of published cases.
For an SME reading a vendor deck, the implication is concrete. The customers in the deck are the visible successes. The customers who would have been the median experience are not visible.
What other biases compound the survivorship problem?
Opt-in reporting is the second filter. Among customers the vendor invites to participate in a case study, only some agree. The ones who agree are more likely to be satisfied with the tool than the average customer. Saying yes to a case study is a positive signal. The case-study population is therefore self-selected to be above-average satisfied, on top of being already top-of-distribution.
Vendor selection is the third filter. Among the willing satisfied customers, vendors choose the ones whose numbers look most flattering. A customer with 1.6x ROI may be perfectly happy and willing to participate, but the vendor will pick the customer with 2.4x ROI instead. The case-study set is therefore the top of the top of the willing.
Measurement bias is the fourth. Vendors report what is easy to measure (productivity gains, hours saved) rather than what is most meaningful to the firm (margin impact, quality cost, reallocation outcome). The case-study figures often miss the parts of ROI that matter most.
Timeframe bias is the fifth. Vendors report figures over the window that flatters the tool. If first-year ROI is disappointing, three-year ROI is reported. If three-year ROI is mixed, a “leading indicator” that is currently strong is reported. The chosen timeframe is rarely the one that would best serve the reader.
The five biases stack multiplicatively, not additively. A case study that survives all five is sitting in the top single-digit percentage of the customer distribution.
What does the actual customer distribution look like?
The true distribution for comparable customers is roughly 1.2x to 2.5x ROI with 50 to 70 percent adoption, with a long left tail of underperformers and abandoners. Case studies showing 3x ROI with 90 percent adoption represent the top decile of customer outcomes, not the median.
This is not a small gap. A firm that benchmarks its expectations against case-study figures is planning for the top decile and will be disappointed by anything closer to typical. That disappointment is the source of much of the AI ROI scepticism currently circulating in SME circles. The marketed numbers are top-decile, the lived experience is median, and the gap between the two is what the disappointment lives in.
The MIT NANDA failure analysis describes the matching pattern from the failure side: roughly 60 to 70 percent of technology projects fail to deliver expected returns, with “failure” defined as delivering less than 70 percent of projection. The distribution is heavily left-tailed.
What questions should you ask of any case study?
Four questions cut through. What is the sample size, and what is the customer-set selection process for these case studies? Across all your customers, not just your case studies, what is the median ROI at twelve months and at twenty-four months? What is the lower quartile? And what proportion of customers abandoned the tool within twelve months?
A vendor who cannot answer these questions has told you something useful. A vendor who can answer them but whose answers diverge sharply from the case-study figures has also told you something useful. The case studies are not lying; they are showing you the upper tail of the distribution. The aggregate numbers, if available, show you the median.
The reframe for the firm reading the deck is concrete. A case study is useful as an existence proof: “this outcome is achievable in this context with these conditions.” It is not useful as an expectation: “this is what we should plan for.” A firm that benchmarks against case studies plans for the top decile and is disappointed by the median.
What about statistics that flatter your sceptical position?
Symmetric scepticism matters. The “73 percent of UK SMEs cannot demonstrate financial AI ROI within twelve months” figure that circulates in industry commentary deserves the same scrutiny as a vendor’s claim. It appears mostly in practitioner writing rather than in academic research with disclosed methodology. The likely source is aggregated survey data from consulting firms or analyst organisations, though the exact methodology is rarely disclosed.
Treated honestly, the figure indicates a tendency, not a precise finding. The likely range is 50 to 75 percent, with the higher figures coming from surveys that count “cannot measure” as equivalent to “did not deliver.”
The principle applies to any statistic that suits the reader’s position. The discipline of asking “where does this number come from?” is the same whether the number flatters the vendor or flatters the sceptic. Numbers earn their place by methodology, not by which side of the argument they support.
If you are reading a vendor deck and trying to work out what to believe, the questions in this post are the place to start. If you’d like to talk through a specific deck or proposal you are evaluating, book a conversation.



