A 25-staff accountancy firm signed up for a hybrid AI plan earlier this year. Twenty pounds per seat per month for ChatGPT Business across the team, five hundred pounds fixed, plus API access at standard rates for an integration into their client portal. The integration generated about three hundred reconciliation summaries a day. Month one bill, five hundred pounds base plus forty-two pounds API. Month two, five hundred pounds plus one thousand two hundred and fifty-nine.
The integration team had not realised the API rate was around four times the implicit per-unit cost of the seat plan. By month four the variable line was bigger than the fixed line. The owner booked a meeting with the vendor and discovered two facts that mattered. The published API rate had been adjusted twice during the contract period, and unused monthly allocations did not roll over. The plan was sold as flexibility. In practice it was a cost trap with a flexibility label on it.
That is hybrid pricing. The shape is now the dominant model in the AI vendor landscape, and the shape itself is reasonable. The danger lives in the second axis of the bill, which many owners do not model when they sign.
What is hybrid AI pricing?
Hybrid AI pricing combines two billing mechanisms in one contract. You pay a fixed monthly fee per user, the way traditional software has always been licensed, and you pay separately for any consumption above what the seat plan includes. The seat fee buys access. The overage rate buys whatever the heavy users actually do.
Industry research published by Chargebee in 2026 found 43 percent of AI and software vendors had moved to hybrid pricing, with another 8 percent layering outcome-based components on top, and only 16 percent maintaining pure subscription-only models. The shape is the new norm because pure per-seat under-prices heavy users while pure usage-based introduces too much revenue volatility for vendors and budget volatility for buyers. Hybrid splits the difference.
Why does it matter for your business?
It matters because the variable axis can dominate the fixed axis quickly, and because the overage rate is where vendors recover the margin they discount on the seat fee. The pattern is structural rather than accidental. Vendors set base subscriptions to look reasonable on the procurement form, then set overage multipliers at three to five times the implicit per-unit cost of the base plan to recover margin from heavy users.
If your plan costs one thousand pounds a month and includes ten million tokens, your implicit rate is ten pence per million. Schematic’s 2025 analysis of AI credit overages found typical overage rates of thirty to fifty pence per million on plans of that shape, a three-to-five-times multiplier on the implicit base rate. The owner who looks only at the seat fee, signs, and discovers the multiplier on the second invoice is not unusual, they are the central case the model is designed for.
There is a second exposure many owners miss. CIO’s 2026 vendor-pricing report found that 100 percent of surveyed AI vendors had changed pricing rates or terms within the previous month, with several adjusting more than once a week. The contract you signed is not necessarily the contract you are paying against today.
Where will you actually meet it?
You will meet hybrid pricing on every major AI vendor’s published rate card. Microsoft 365 Copilot offers per-seat licences alongside a Pay-As-You-Go billing policy that meters specific Copilot scenarios through Azure, with consumption tracked in real time and billed against the same Azure subscription you already manage. The architecture is genuinely hybrid: you can layer the variable axis on top of existing per-seat licences without renegotiating the base.
OpenAI’s ChatGPT Business sells per-seat at twenty pounds per user per month for direct user access, then bills separately for API access used in your own integrations, at published per-thousand-token rates that vary by model. Anthropic’s Claude Team sits at twenty-five pounds per user per month for the Standard tier, with Claude API consumption charged on top by model and token type. The team plan and the API plan are different surfaces of the same vendor relationship, and the bill is the sum of both.
GitHub Copilot is the cleanest example of the transition. From June 2026 onwards, GitHub is replacing premium request units with AI Credits tied to subscription value. A Pro subscriber at ten pounds a month receives ten pounds of monthly AI credits, with token usage above that allowance billed at published rates. Per-seat is the floor. Credits are the variable axis stacked on top.
When to ask about it, when to ignore it
Ask hard questions whenever your AI spend is moving past a few hundred pounds a month, whenever an integration is generating significant volume, or whenever you are considering rolling AI to more than a handful of users. Below those thresholds the modelling work is not worth the time. Above them, the variable axis is exactly where the surprises live, and a procurement conversation that ducks it will cost real money inside two invoice cycles.
Five questions earn their place at the procurement stage. What metrics constitute usage in your overage charging. What is the implicit per-unit cost of the base plan, and what is the multiplier on the overage rate. Do unused allocations roll over or expire monthly. Can you set hard spending caps with alerts at 50, 75 and 90 percent. And how often have rates changed in the last twelve months, with what notice. A vendor who cannot or will not answer those questions is signalling either pricing instability or operational immaturity. Walk away.
You can largely ignore the detail when your total AI spend sits below about a hundred pounds a month, or when the entire engagement is one or two seats with no integration on top. The complexity of modelling both axes is not free, and beneath that threshold the worst-case bill is small enough that the simpler decision dominates. The line moves up as a firm scales, and the conversation should be revisited annually rather than once.
Related concepts
Tokens are the units the variable axis is denominated in for language-model APIs, roughly three-quarters of an English word each. Hybrid pricing only makes sense once you can read a vendor invoice in tokens rather than in pounds, because the seat fee converts to an implicit token rate and the overage line is measured in tokens directly.
Input and output tokens are billed at different rates, with output typically three to five times the cost of input on the same model. A heavy-summarisation workload will skew towards output tokens and bend the variable axis upwards faster than a heavy-classification workload of the same total volume.
Prompt caching and the Batch API are the two largest discount levers available on the variable axis. Caching cuts roughly 90 percent off input cost on long stable prefixes, and Batch cuts 50 percent off both input and output where a 24-hour turnaround is acceptable. Both stack with hybrid plans, and both materially change where the overage line lands at the end of the month.
The wider AI subscription stack sits behind hybrid pricing as the operational problem. A typical owner-managed firm now runs between four and seven AI subscriptions, several with their own variable-axis exposure. The bill is rarely one line.
If your AI invoice is creeping up faster than your headcount is, hybrid pricing is the place to look first. Get the overage rate, the rollover terms, and the cap controls in writing before the next renewal. The seat fee is rarely the bill.



