She has the budget. The vendor has been encouraging. The in-house enthusiast wants to move. The pitch deck she has been handed says the scaling decisions can be made later, once the pilot is up and running and the team is comfortable with the tool. She is sitting in front of the proposal, and the only thing she is genuinely unsure about is whether to commit to the bigger rollout now or hold the line on a narrower test.
That is the moment the McDonald’s IBM drive-thru story is for.
Most of the coverage treats the McDonald’s pilot as a punchline. Nine sweet teas, bacon on ice cream, $260 of chicken nuggets piling up on a viral video. It is funny, and the comedy is real, but the comedy is not the lesson. The lesson is what McDonald’s did once the numbers came back.
What actually happened with the McDonald’s IBM drive-thru pilot
McDonald’s tested IBM’s Automated Order Taker, a voice-AI system handling drive-thru ordering without a human, across more than 100 restaurants from 2019 to mid-2024. That is a slice of the roughly 40,000 sites in the global estate. Reported order-fulfilment accuracy sat in the low-to-mid 80% range against an estimated 95% needed to justify replacing human staff. In July 2024 McDonald’s ended the partnership.
That is the headline arc. A three-year test at fewer than a quarter of one per cent of locations. An honest read of the numbers. A clean decision to stop, with a separate statement that voice ordering would still be explored, including with Google Cloud.
The viral order errors got the airtime, but the franchise economics did the actual deciding. At low-to-mid 80s accuracy you are paying for the technology and still paying staff to catch the mistakes. The case for the tool, as priced and built, did not close.
Why the pilot discipline is the real story, not the technology choice
The technology choice is not what an SME owner should read McDonald’s for. The structure of the test is. McDonald’s did three things in sequence that many firms get wrong, and doing them kept a difficult voice-AI test from becoming a much larger rollout failure. Voice AI in noisy environments was always going to be hard, and other operators are running narrower versions of the same idea well.
They tested at small enough scale that stopping cost them learning, not the business. They measured against an explicit, economically grounded threshold, accuracy in the mid-90s, not a vague sense of “is this working.” And they had a decision point with a date attached, so the question of go or stop did not drift through another quarter of vendor optimism. The willingness to walk away in July 2024, three years and a public narrative in, is the move. Rolling the technology out to all 40,000 locations on the strength of vendor confidence is the failure that did not happen because the pilot was designed to make stopping possible.
Where this lands for an owner-managed firm
For an owner-managed firm running its first AI pilot, the parallel is direct even though the scale is not. The MIT NANDA report on generative AI in 2025 found about 95% of enterprise GenAI pilots stalled, with failures rooted in integration, governance and workflow fit rather than model quality. McKinsey’s 2025 State of AI survey put roughly two-thirds of AI-using organisations still piloting rather than at scale.
Stratify Insights’ 2026 benchmark estimated 60 to 80% of AI projects fail to reach production. The pattern is consistent. The technology often works. The pilots frequently do not. What the McDonald’s case adds is the discipline that prevents a stuck pilot from becoming a stuck rollout. Pilot scope small enough that stopping does not hurt the firm. Success metrics written down before the pilot starts, with the threshold for go and the threshold for stop sitting on the same page. A decision point with a date. A default, if the numbers do not hit threshold, of “stop and learn” rather than “extend the pilot another quarter and hope it matures.” Owner-managed firms over-extend pilots far more often than they over-extend rollouts, because the over-extension feels like prudence at the time.
The harder lesson, why stopping is politically expensive and financially cheap
Stopping costs you nothing on the balance sheet and a great deal in the room. You have to tell the vendor it did not work. You have to absorb whatever you have spent. You have to redirect the in-house enthusiast. You have to sit with the fact that the optimism in the original pitch deck did not survive contact with the numbers. The discomfort is the reason pilots frequently run too long.
The asymmetry is the bit to hold on to. The political cost of stopping is local, immediate and visible. The financial cost of not stopping is downstream, larger, and easier to defer. Sinch’s customer-communications research, reported in 2025, found that around 74% of firms running AI communication agents in production had already rolled back or shut down at least one on governance grounds, and the firms with stronger oversight were the ones doing it more often. The willingness to stop is a maturity signal, not a failure signal. McDonald’s took the politically expensive move in July 2024 and avoided a much larger financial one. Walmart did something similar later when its OpenAI Instant Checkout test came in at roughly a third of website conversion, and OpenAI rolled the feature back. The discipline travels.
What the case does not say, and what to do with it before you commit
The McDonald’s case does not say voice AI does not work. It does not say IBM’s technology was bad. It says the fit between this tool, this environment and this economic model was not viable at scale, and the right response to a wrong fit was stopping. The case is not an argument for caution as a default, it is an argument for clarity about when you will stop.
Apply the same discipline to the AI proposal sitting on your desk. Write down, before the pilot starts, what success looks like in numbers you can measure. Write down the threshold for stop, not just the threshold for go. Put a date on the decision point. Ask the question the McDonald’s leadership team had to answer in 2024, what would have to be true at the decision point for us to walk away, and is the team in this room willing to do it if that condition is met. If the answer is no, the pilot is already too big.
If you want to test that discipline on a live AI decision your firm is sitting on, book a conversation.



