Capacity planning when AI is your second variable cost

Capacity planning when AI is your second-biggest variable cost

TL;DR

In many owner-managed service firms AI tooling has quietly become the second variable cost line after staff. Planning capacity on headcount alone now misses where the bottleneck actually sits. The four-line capacity model tracks staff hours by capability, AI tooling capacity by use case, variable cost per unit of work for each, and sensitivity to mix shifts, so you can see capacity coming before the forecast goes wrong.

Key takeaways

- AI tooling has shifted from experimental spend to a material variable cost line, with enterprise surveys reporting AI rising from 0.8 per cent of revenue in 2025 to 1.7 per cent in 2026 and 25 to 50 per cent of IT budgets within a few years - AI capacity scales differently from staff, instant scale-up, no notice period, no holiday, but with vendor rate limits, concurrency caps, fixed monthly tiers, and a real prompt-and-workflow maintenance cost that is rarely sized - The four-line capacity model owner-operators should run is staff hours by capability, AI tooling capacity by use case, variable cost per unit of work for each, and sensitivity to mix shifts as work moves between human and AI delivery - The three early warning signs of capacity mismatch are AI bills growing faster than revenue, staff time on AI oversight growing faster than AI tooling cost, and recurring small errors that capacity slack would once have caught - Around 80 to 85 per cent of enterprises miss their AI infrastructure forecasts by more than 25 per cent, the gap is usually unsized token growth, context-window expansion, and orchestration layers nobody costed at the start

An owner I sat with last month put her last year of AI invoices on the table in date order. £200 a month at the start, £4,000 by the end. She had not consciously decided to grow the bill, the tools had simply earned their way in, one workflow at a time. Her capacity plan for the next quarter was a staffing sheet and a holiday rota. The AI line did not appear anywhere on it. Her forecasts had started to be wrong by 10 to 15 per cent and she could not work out why.

She is not unusual. AI tooling has crept into the second variable cost line in many owner-managed services firms, behind staff and ahead of premises and software combined. Mavvrik’s 2026 cost-statistics analysis reports that companies plan to spend an average of 1.7 per cent of revenue on AI in 2026, against 0.8 per cent in 2025, with AI expected to consume 25 to 50 per cent of IT budgets within a few years. At a £1m firm, a £4,000 monthly AI bill is already 4.8 per cent of revenue, about three times that cross-sector average.

The capacity question is no longer “how many people,” it is “how many people plus how much AI tooling.” The firms that build the combined model see capacity coming. The firms that do not run on lag.

What is capacity planning when AI is a variable cost?

It is capacity planning that treats AI tooling as a delivery resource with its own throughput, its own constraints, and its own variable cost per unit of work, rather than as overhead software. AI is no longer a fixed subscription you can ignore. It is metered consumption that scales with revenue-generating activity, so its capacity and bottlenecks need to be planned alongside staff hours.

Deloitte’s CFO guide to AI token economics argues that tokens, not users or time, are now the fundamental unit of consumption, and that finance functions need to track AI cost per workflow, per client, and per active user rather than per licence. That reclassification is the underlying move. Once AI sits in cost of goods sold rather than overhead, it has to be planned as capacity.

Why does this matter for your business?

It matters because the bottleneck in your firm has probably shifted, and the staffing-only plan no longer tells you where it is. The clearest tell is forecasts wrong by amounts that do not track to headcount. Glean’s AI total cost of ownership work reports first-year budget overruns of 30 to 40 per cent when AI is not modelled explicitly, and Everest Group calls the pattern the “token cost illusion”.

For an owner-managed firm the second risk is gross margin erosion that the management accounts do not flag. The SaaS CFO’s analysis shows that embedding AI features into a service without revisiting pricing and cost classification can pull gross margin from 80 per cent to 65 per cent, fifteen points of margin moving silently from overhead into variable COGS. The capacity plan that treats AI as a delivery resource is what makes that movement visible early enough to act on.

Where AI capacity is the same as staff capacity, and where it is not

AI capacity differs from staff in three ways that matter to a capacity plan. It scales instantly when you need more, you add seats, tokens or concurrent runs rather than recruiting. It has no notice period, holiday calendar, sickness, or Monday-morning ramp-up. And it can run overnight and at weekends without overtime, useful when the time window is the constraint.

AI capacity is the same as staff in four ways owners commonly miss. It is still a fixed monthly tier you have committed to, paid whether the work shows up or not. It is still constrained at the top, OpenAI’s rate-limit documentation defines requests-per-minute, tokens-per-minute and tokens-per-day caps tied to spend tier, and Red Hat’s TokenRateLimitPolicy shows the same pattern in enterprise gateways, with HTTP 429 errors when daily quotas blow. It still has concurrency caps that limit parallel runs. And it still has an upkeep cost, prompt design, workflow maintenance, evaluation and oversight, that does not appear on the licence invoice but absolutely shows up in staff time.

What is the four-line capacity model?

The four-line model is a spreadsheet, not a methodology. Line one is staff hours by capability, total hours available per quarter by role and skill, adjusted for holidays and non-billable commitments. Line two is AI tooling capacity by use case, effective throughput translated from vendor rate limits and concurrency caps into work units like documents drafted per hour or first-pass research tasks per day.

Line three is variable cost per unit of work for each capacity type. For staff this is fully-loaded hourly cost divided by units, with rework counted. For AI it is either vendor tokens-per-task multiplied by published price, or total monthly AI spend divided by mediated tasks if your use case sits behind a SaaS wrapper. Line four is sensitivity to mix shifts, what total cost and capacity look like at different AI-versus-staff splits of the same workload.

A small worked example makes the model concrete. A consultancy expects 1,000 standard analysis tasks in a quarter. In a hybrid mix, AI handles 70 per cent at low marginal cost, analysts handle the remaining 30 per cent and oversight on the rest. The four-line view shows required staff hours, required AI throughput against vendor limits, blended cost per task, and what happens if mix shifts to 50-50 or 90-10. WorkflowMax’s professional services capacity forecasting writes the same picture for staff, the four-line model adds the AI side alongside rather than treating it as separate.

What are the early warning signs of capacity mismatch?

There are three signs that capacity and demand have fallen out of alignment, all visible in the management pack if you know where to look. The first is AI tooling bills growing faster than revenue over two or three quarters running. Mavvrik’s data finds 80 to 85 per cent of firms miss AI forecasts by more than 25 per cent, so the variable cost is eroding gross margin rather than expanding capacity.

The second is staff time on AI oversight growing faster than AI tooling cost itself. Charter’s work on AI tool bloat describes the diminishing marginal productivity that comes from juggling multiple overlapping tools and the “toggle tax” of context switching. When the oversight hours rise faster than the licence bill, the firm is paying twice, once for the tool and once for the people behind it, and the substitution AI was meant to deliver is not happening. The hybrid AI economics literature suggests that even well-designed systems still require 10 to 20 per cent of the human resources the fully-manual version would have used, anything materially above that is a flag.

The third sign is recurring small errors that capacity slack would once have caught. Mis-filed documents, slightly off data extractions, tone-deaf client emails, mis-prioritised tasks. When firms run staff capacity hot and lean on AI for throughput, there is less human slack to spot and correct minor issues before they reach clients. The fix is often a narrower set of AI use cases and a better oversight loop rather than more headcount, but the trigger is the same, slack has gone and small errors are accumulating.

Sister posts walk the related ground. The hidden margin tax of AI sizes the oversight time more precisely. Where AI moves margins sorts which use cases are pulling their weight. The growth and profit dashboard owner-operators need pulls these signals into the eight or ten numbers you actually review each month.

If you want to build the four-line capacity model for your own firm and walk through the warning signs against your last two quarters, Book a conversation.

Sources

- Mavvrik (2026). AI Cost Statistics 2026, finding that companies plan to spend on average 1.7 per cent of revenue on AI in 2026 against 0.8 per cent in 2025, and that around 80 to 85 per cent of enterprises miss their AI infrastructure forecasts by more than 25 per cent. https://www.mavvrik.ai/blog/ai-cost-statistics-2026/ - Everest Group (2025). The Rising Artificial Intelligence Consumption Costs, From Innovation to Inflation, on the "token cost illusion" where falling per-token prices mask rising total cost of ownership as context windows and orchestration layers expand. https://www.everestgrp.com/blogs/the-rising-artificial-intelligence-ai-consumption-costs-from-innovation-to-inflation/ - Deloitte (2025). The CFO Guide to AI Token Economics, arguing CFOs should treat AI programmes as strategic investments subject to rigorous token-level cost tracking, forecasting and ROI analysis. https://www.deloitte.com/us/en/services/consulting/articles/cfo-guide-ai-token-economics.html - Glean (2025). How to Budget for the Total Cost of Ownership of AI Solutions, reporting first-year budget overruns of 30 to 40 per cent when AI total cost of ownership is not modelled explicitly across infrastructure, integration and change management. https://www.glean.com/perspectives/how-to-budget-for-the-total-cost-of-ownership-of-ai-solutions - OpenAI (2026). API Rate Limits documentation, defining requests-per-minute, tokens-per-minute, and tokens-per-day limits at organisation and project level, with usage tiers tied to spend. https://developers.openai.com/api/docs/guides/rate-limits - Red Hat (2026). Manage AI Resource Use with TokenRateLimitPolicy, on per-user and per-group token quotas and HTTP 429 throttling when daily allocations are exceeded. https://developers.redhat.com/articles/2026/02/18/manage-ai-resource-use-tokenratelimitpolicy - Avasant (2025). IT Spending as a Percentage of Revenue by Industry, Company Size and Region, showing total IT operational spend ranges from about 4.4 per cent to 11.4 per cent of revenue between the 25th and 75th percentile in financial services. https://avasant.com/report/it-spending-as-a-percentage-of-revenue-by-industry-company-size-and-region/ - WorkflowMax (2025). Using AI to Forecast Staff Capacity, on AI-supported capacity forecasting in professional services, including identifying future bottlenecks, modelling staff utilisation and protecting margins. https://workflowmax.com/blog/using-ai-to-forecast-staff-capacity - The SaaS CFO (2025). Your AI Feature is Quietly Destroying Your Gross Margin, on how embedded AI costs reclassify from fixed overhead to variable COGS and erode gross margin from around 80 per cent to 65 per cent if pricing and tracking are not updated. https://www.thesaascfo.com/your-ai-feature-is-quietly-destroying-your-gross-margin/ - Charter (2025). The Hidden Cost of AI Tool Bloat and How Managers Can Reduce It, on diminishing marginal productivity as workers juggle overlapping AI tools and the "toggle tax" of context switching. https://www.charterworks.com/the-hidden-cost-of-ai-tool-bloat-and-how-managers-can-reduce-it/

Frequently asked questions

How big does the AI bill have to get before capacity planning needs to change?

As a rough threshold, when AI tooling crosses one per cent of revenue or 30 per cent of your IT spend, plan it as a capacity line, not a discretionary cost. A 4,000 pound monthly bill on a one million pound firm is 4.8 per cent of revenue, about three times the 2026 cross-sector average of 1.7 per cent. At that level the AI line is now shaping your delivery economics and needs the same forecasting discipline as staff.

Does AI capacity really behave so differently from staff capacity?

It scales much faster up and down, but it is not infinite. API providers apply requests-per-minute, tokens-per-minute, and tokens-per-day limits that throttle peak throughput, concurrency caps that limit parallel runs, and monthly spend caps that can stall a workflow mid-quarter. On top of that you still pay fixed monthly tiers and prompt-and-workflow maintenance. Treat AI as elastic-with-constraints, not infinite.

What is the simplest first step if I have never modelled AI capacity?

Build the four-line view on one quarter of forecast work. List staff hours by capability, list AI tooling capacity by use case in rate limits and concurrency, calculate variable cost per unit of work for each side, then run two mix scenarios, current mix and an AI-heavy mix. The model usually surfaces one immediate cost surprise and one structural decision worth taking before the next renewal cycle.

Written by Dr Dave Heath, AI consultant and business strategist.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Capacity planning when AI is your second-biggest variable cost

Key takeaways

What is capacity planning when AI is a variable cost?

Why does this matter for your business?

Where AI capacity is the same as staff capacity, and where it is not

What is the four-line capacity model?

What are the early warning signs of capacity mismatch?

Sources

Frequently asked questions

How big does the AI bill have to get before capacity planning needs to change?

Does AI capacity really behave so differently from staff capacity?

What is the simplest first step if I have never modelled AI capacity?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Capacity planning when AI is your second-biggest variable cost

Key takeaways

What is capacity planning when AI is a variable cost?

Why does this matter for your business?

Where AI capacity is the same as staff capacity, and where it is not

What is the four-line capacity model?

What are the early warning signs of capacity mismatch?

Sources

Frequently asked questions

How big does the AI bill have to get before capacity planning needs to change?

Does AI capacity really behave so differently from staff capacity?

What is the simplest first step if I have never modelled AI capacity?

Ready to talk it through?

Related reading

Why the time AI saves never reaches the bottom line

Where AI pays back first in a professional services firm

Where AI pays back first on a construction project

If any of this sounds familiar, let's talk.