Building your AI delegation stack: who does what

A founder I spoke to in April was paying for ChatGPT Pro, Claude Pro, Perplexity Pro, and Gemini Advanced. Roughly £80 a month across four subscriptions. She used them more or less interchangeably, asking whichever one was open in a tab. She felt guilty that she was under-using all four. She asked me which one she should drop.

The right answer was the one she did not expect. Drop none of them yet, but stop pretending they do the same job. A working AI delegation stack has three named workers, each with a defined remit. Pick which subscription plays which role, give it a one-paragraph job description, and live with that for a fortnight before changing anything. The relief comes from stopping the daily decision about which subscription to open, not from the stack you eventually settle on.

This post is part of the AI for Your Own Work cluster. The framework piece on how AI changes the delegation maths sits upstream of it. The three sibling pieces, AI vs person delegation, when AI replaces a VA, and briefing AI like a contractor, each go deep on one of the harder calls.

What are the three workers, and why three?

Three workers cover the actual shape of founder work without spilling into tool sprawl. The generalist drafter handles writing and admin, the researcher handles information that the model does not already know, and the thinking partner handles judgement calls where you want a second view before committing. Three is enough specialisation that each role is genuinely better than a generalist, and few enough that the switching cost does not eat the gain.

Tobias Lütke at Shopify built the same shape at company scale. His public description, written up by First Round Review, is an internal LLM proxy that lets engineers choose the right model per task, plus a library of specialised agents built once and reused. One interface, multiple models, role per job. The named-CEO disclosures from Wade Foster at Zapier, Aaron Levie at Box, and Pedro Franceschi at Brex describe smaller versions of the same pattern. Foster’s open stack is closer to seven. Levie’s “every agent needs a box” principle is the architectural argument behind it: bounded autonomy beats unbounded chat.

The choice-overload research from Sheena Iyengar and Mark Lepper, the canonical jam-study paper, is the empirical foil. Six options outperform twenty-four on completion rate, satisfaction, and perceived quality. A stack of three workers sits comfortably below the threshold where decision fatigue sets in. A stack of eight is past it before Tuesday lunchtime.

What is each worker actually for?

Each worker has a single defined remit. The drafter handles writing and admin, the researcher handles current information, and the thinking partner handles judgement calls. None of them does the others’ job. The remit is the thing that lets you tell which worker slipped when output goes wrong, because you can point at the brief it was given and the output it produced.

The generalist drafter is the highest-volume worker. It handles email triage, first-pass document drafts, status updates, meeting notes, and the standing recipes you call from a Monday morning. As of May 2026 the strongest model for this role on public arenas is Claude Opus, which holds the top position across writing, code, and search on the LMSys leaderboards. Vendor leapfrog is real and the gap is months not years. Pick the current leader and stay with it for a fortnight before re-shopping.

The researcher handles anything that needs information from after the model’s training cutoff. Competitive intelligence, regulatory updates, current pricing, what a named contact has said publicly in the last quarter. Perplexity Pro with Deep Research is the cleanest tool for this in May 2026, scoring 93.9 percent accuracy on SimpleQA, the factual-recall benchmark, against 72 percent for the strongest non-search model. Crossing the streams into drafting is where founders end up with confidently fabricated quotes.

The thinking partner is the lowest-volume and highest-stakes worker. You call it for partnership decisions, pricing changes, hiring calls, any moment where a second view earns its keep before you commit. The remit is structured input, structured output, judgement work only, never a replacement for the conversation with the actual humans involved.

How does the check-back protocol work in practice?

The check-back protocol is the rule for when you compare workers and when you do not. For routine drafting you do not. The drafter handles the inbox triage, you review in a Monday batch, you move on. For a high-stakes call you run the same brief through two workers and compare. The cost is twenty minutes. The value is catching one bad call a quarter.

The discipline that matters is structured input. State the decision clearly, name the constraints, ask for both the recommendation and the failure case, request comparison to a historical analogue. Dan Shipper at Every wrote the cleanest public version of this protocol on the Beyond the Prompt podcast. The same input shape feeds both workers. You read both outputs side by side. Where they agree, you move. Where they disagree, the disagreement itself is the information.

The temptation is to run every decision through two workers. Resist it. The bulk of decisions a founder makes in a week are routine and want one drafter, fast. Reserving the comparison protocol for genuinely high-stakes calls keeps the cost of using it low and the signal high. A protocol you use for everything is a protocol you stop using by Friday.

What is the kill-switch and how do you actually run it?

The kill-switch is your way of noticing when a worker’s quality has dropped, because models do change without notice. Anthropic’s April 2026 postmortem walked through three separate Claude regressions in the previous six weeks: a shift in reasoning effort, a bug that cleared session thinking, and a verbosity change. All three were reverted within a fortnight. The discipline matters because even a vendor with strong remediation ships quality changes that take days to surface.

The mechanic is a monthly five-minute benchmark. Pick one task per worker, save the output, and re-run it next month with the same prompt. Compare the two. Ask whether the answer would still be useful. If quality has clearly moved, you switch. The Information Commissioner’s Office UK guidance on AI and data protection makes the same point on lawfulness, fairness, and transparency: the operator has to know whether the system is still doing what it claimed.

Without a kill-switch, you find out about a quality drop from a client. With one, you find out from a five-minute test on the second Monday of the month. The cost is trivial. The cost of the alternative, hearing it from a customer, is not.

When should you actually consolidate, and when should you wait?

Resist the urge to consolidate immediately. Run the named stack for a fortnight first. The founder I started with needed permission to keep her four subscriptions and put names on three of them: drafter, researcher, thinking partner, with the fourth as a deliberate alternate for when the kill-switch fires. After two weeks the consolidation question answers itself, because the subscription doing no defined work becomes obvious.

The National Bureau of Economic Research surveyed 6,000 CEOs and CFOs in early 2026 and found nearly 90 percent reported no productivity impact from AI despite 67 percent using it. The gap is structural rather than technical, sitting in the absence of remits, check-backs, and kill-switches around the tools. A founder running three workers with remits, check-backs, and a kill-switch is operating in the small minority who actually capture the gain. A founder running four interchangeable subscriptions with no roles is statistically indistinguishable from a founder using none of them.

If you want to think this through with someone who has run the experiment on his own desk, book a conversation.

Building your AI delegation stack: who does what

Key takeaways

What are the three workers, and why three?

What is each worker actually for?

How does the check-back protocol work in practice?

What is the kill-switch and how do you actually run it?

When should you actually consolidate, and when should you wait?

Sources

Frequently asked questions

Why three workers and not five?

Does it matter which model I use for which role?

What does the kill-switch actually look like?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Building your AI delegation stack: who does what

Key takeaways

What are the three workers, and why three?

What is each worker actually for?

How does the check-back protocol work in practice?

What is the kill-switch and how do you actually run it?

When should you actually consolidate, and when should you wait?

Sources

Frequently asked questions

Why three workers and not five?

Does it matter which model I use for which role?

What does the kill-switch actually look like?

Ready to talk it through?

Related reading

Strategy work, with AI as the second brain in the room

Role-playing the difficult client conversation with AI

Pricing a new offer with AI in the room

If any of this sounds familiar, let's talk.