The pilot-to-scale valley of death in AI

The pilot worked. Everyone who saw it said so. The output was cleaner, the turnaround faster, and the team running it was visibly proud of what they’d built. Then someone asked the obvious question: when does the rest of the business get this?

Six months later, you’re still answering it. The pilot sits exactly where it landed, a single pocket of capability that hasn’t moved, despite clear proof that the approach works. The founder is asking for an update. The board wants to see traction. You’re holding the space between “we proved it works” and “we actually did it.”

That space has a name.

What is the pilot-to-scale valley of death?

The pilot-to-scale valley of death is the gap between a proof of concept that works in a controlled setting and a capability that is embedded across the business. BCG’s research on AI adoption finds roughly half of companies are stuck in this position, able to demonstrate value in isolated tests but unable to move it into standard practice. The valley is a recognised stage, not an anomaly.

The term comes originally from product development, where it describes the gap between early-stage funding and commercial viability, where promising ideas exhaust their momentum before becoming products. Applied to AI rollouts, the equivalent is operational. The pilot team has built something that works for them, integrating it into their habits and refining the output over weeks of real use. Everyone else is still working the way they always have.

The integration work the pilot deliberately avoided is what scaling requires. That means a documented process, a named owner, and people who weren’t involved in the original test brought in and given a genuine stake in what happens next. In many pilot plans, none of that is budgeted for.

Why does this matter when you’re holding an AI mandate?

Your mandate was to get AI working across the business. A pilot that stays a pilot does not fulfil that mandate. MIT’s GenAI research finds roughly 95% of generative AI pilots show no measurable revenue impact; workflow integration, not model quality, is consistently identified as the primary bottleneck. The valley is where AI mandates go to stall, and where the delegate absorbs the blame.

The pressure builds quickly. Sponsors who backed the initiative see a demonstration, see it work, then watch months pass without visible change in how the business actually operates. The founder or board starts asking when the rest of the organisation gets the same capability. Your window to maintain momentum is narrower than it looks.

There is also a confidence effect inside the team. People who were not part of the pilot are watching to see whether this is real change or another initiative that will fade. A slow transition to scale confirms their scepticism before you’ve had a chance to address it, and that is harder to reverse than a technical problem.

Turning a proof of concept into a repeatable operation is what you were hired to do. The pilot proved it was possible. Scaling has to prove it was worth it.

Where does the wall actually appear?

The wall appears at the handoff. When the pilot team completes its work and responsibility moves to a broader group, three things typically break down. The incoming team was not part of building the workflow. No one owns the ongoing quality check. And the new users have no obvious reason to change habits that were already working for them.

Behind this is an alignment gap. Kyndryl’s 2024 research finds around 70% of leaders say their workforce is not ready for AI change. The gap is rarely about technical skill; the people doing the work have often not been helped to understand why the change is happening, or to feel any ownership over it.

A pilot sidesteps this by design. You recruit willing participants, work in a contained space, and get clean results precisely because the conditions are controlled. Moving that to a wider team with different pressures and no stake in what was built strips those conditions away.

Ownership is the crux. Someone in the broader rollout needs to care enough about the workflow to maintain it, adapt it when the output drifts, and advocate for it when something goes wrong. Pilot teams generate that ownership naturally because they built the thing. The rest of the business needs it designed in before the handoff, not rebuilt after the stall has already started.

When is a stall a signal to slow down, and when to push through?

A stall means one of two things. Either the rollout is missing the integration and ownership work that scaling requires, a solvable problem if you address it directly. Or the pilot solved a problem that matters to a small team but carries no weight for the broader business, in which case the right answer is to scope more tightly rather than scale more widely.

The diagnostic is whether the scaling work was done or skipped. Have the people who will use the new workflow been involved in adapting it to their reality? Is there a named owner who will keep it running after the transition? Has the process been documented well enough that someone outside the pilot team can follow it reliably?

Clear yes answers across those questions, combined with a rollout that is still stalling, suggest the problem itself was the issue. Not every successful pilot deserves to become standard practice. Some work beautifully in a small test precisely because the test was controlled and the participants were self-selected.

The honest answer usually sits in the middle. The pilot proved value in the right place, but scaling was treated as a delivery task rather than as a project in its own right, one that needs the same preparation the pilot had. Recognising that distinction early saves a significant amount of recovery work later.

What sits alongside the valley, and what to sort first?

The valley does not appear in isolation. Its depth, and whether you cross it, depends on three things that sit outside the technology itself. Active executive sponsorship is what makes hesitation cost something for the team. Clear decision rights are what stop the handoff from becoming a power vacuum. And workflow readiness is what determines whether the output holds once the AI is running outside the pilot team.

Visible, sustained C-suite sponsorship is among the strongest predictors of adoption across change management research. BrainStorm’s analysis of enterprise technology rollouts found organisations with active senior sponsorship consistently reaching higher activation rates within ninety days than those managing rollouts without it. When the most senior person has made it clear that this is how the business works now, hesitation carries a cost. Without that signal, waiting costs nothing.

Decision rights are closely related. The pilot-to-scale problem often surfaces as a power vacuum, where the pilot team has finished, the wider team is waiting for direction, and no one has been explicitly named as the person responsible for what comes next. Naming that owner at the start of the rollout, rather than after the stall begins, closes one of the most predictable failure points in an AI programme.

Workflow readiness is the third factor and the least discussed. AI does not straighten a crooked process. If the work the tool sits inside is informal, inconsistent, or undocumented, the output will be inconsistent too. Sorting the underlying process before layering AI on top of it is slower at the start and faster everywhere after.

The pilot-to-scale valley of death, and how to cross it

Key takeaways

What is the pilot-to-scale valley of death?

Why does this matter when you’re holding an AI mandate?

Where does the wall actually appear?

When is a stall a signal to slow down, and when to push through?

What sits alongside the valley, and what to sort first?

Sources

Frequently asked questions

Why do AI pilots succeed but still fail to scale across the business?

How do I know whether to push a stalled rollout or step back and rescope?

What is the most important thing to have in place before a pilot tries to scale?

Ready to talk it through?

If any of this sounds familiar, let's talk.

The pilot-to-scale valley of death, and how to cross it

Key takeaways

What is the pilot-to-scale valley of death?

Why does this matter when you’re holding an AI mandate?

Where does the wall actually appear?

When is a stall a signal to slow down, and when to push through?

What sits alongside the valley, and what to sort first?

Sources

Frequently asked questions

Why do AI pilots succeed but still fail to scale across the business?

How do I know whether to push a stalled rollout or step back and rescope?

What is the most important thing to have in place before a pilot tries to scale?

Ready to talk it through?

Related reading

Choosing AI tools that help recruitment agencies work faster

Choosing AI support for an insurance brokerage

The agentic AI demo that collapses at scale

If any of this sounds familiar, let's talk.