You are looking at last month’s software bills and the AI line has gone up again. Nothing obvious has changed: same tools, roughly the same workload, same team size. But somewhere in the detail there is a reference to token usage, and nobody in the business can give you a clear answer about what that means or what, if anything, to do about it.
This situation is common for owner-managers right now. AI pricing has a clear logic once you understand it, but the people selling these tools rarely slow down to explain it. Here is what actually drives the cost.
What is a token, and what are you actually paying for?
AI billing works on tokens rather than questions or minutes of use. A token is roughly three to four characters of text, so a short sentence is around 20 tokens and a page of text is 500 to 700. You pay separately for what you send in and what the model generates back. Both sides of the exchange carry a price.
The gap between models matters more than many buyers realise. OpenAI and Anthropic both publish their per-million-token pricing across their ranges, and the difference between a top-end frontier model and a mid-tier or mini model from the same provider can exceed ten to one. Choosing which AI service to use is also, quietly, choosing a price point per unit of work. As usage grows, that choice compounds quickly.
Many niche AI tools sell access with a monthly credit or word allowance rather than a raw token count. Those credits map back to tokens at a fixed rate behind the scenes, so the unit economics are real even when the pricing page does not show them.
Why are costs rising even when headline prices are falling?
The headline per-token price for many AI services has been falling, which sounds reassuring. But effective spend for many businesses has gone up, and the gap between those two facts is where the confusion lives. Usage patterns are changing faster than prices are dropping: teams are reaching for heavier models, sending longer prompts, and building workflows that call AI repeatedly rather than once.
Running modern AI models demands specialised hardware and power-hungry data centres. UK data centre power demand is projected to more than double between 2023 and 2030, driven largely by AI workloads, and the National Grid has flagged the pressure this places on electricity networks in London and the South East. Those infrastructure costs feed into what providers charge.
Usage behaviour is where owner-managed businesses feel it most directly. Anthropic published guidance in April 2025 showing that average enterprise spend on Claude Code had roughly doubled, with no change in pricing, because developers were using heavier Opus models in real deployments. MindStudio’s analysis of real-world deployments shows that a tenfold increase in users can produce a fifteenfold increase in token costs, because conversations get longer, context windows grow, and additional AI-powered features get switched on. The relationship between users and costs is not linear, and that is what catches businesses off guard.
Where will you actually meet this in your business?
If your team is using Microsoft Copilot or ChatGPT Team, token costs are largely absorbed into the per-seat licence fee. The provider bundles average expected usage into the subscription price, so you pay a predictable monthly amount regardless of how many prompts your team runs. Token pricing becomes directly visible once you move into territory where you are managing the AI yourself.
The three pricing shapes you will encounter are pay-as-you-go per token, used by OpenAI, Anthropic, and Google for API access; per-user monthly licences, where Microsoft Copilot for Microsoft 365 is priced at £24.70 per user per month in the UK as of early 2025; and feature-based SaaS tools that sell a monthly credit or word allowance which maps back to tokens but keeps the unit economics out of view.
The situation where costs bite hardest is when you integrate an API into your own systems: customer support automation, bulk document drafting, or searching large document libraries. That last scenario, often called retrieval-augmented generation or RAG, can add thousands of context tokens to every single query, because the system pulls in relevant documents and includes them in each prompt before the model processes your question. A workflow that costs little at ten queries a day can become significant at five hundred.
What actually drives your bill up, and what can you control?
Two levers control the bulk of what you pay: which model you choose and how much text you send it. Use a top-end frontier model when a mid-tier version would do the same job, and you could be paying ten times more per task. Send a 20-page contract as context when a two-page summary would serve, and you have multiplied the input cost tenfold.
A third factor is workflow design. Teams moving from simple question-and-answer to agentic workflows, where the AI plans, retrieves information, drafts, checks, and revises in a loop, often see token consumption increase sharply without anyone noticing until the bill arrives. Switching on background features like auto-translation or sentiment analysis adds hidden model calls that accumulate quietly.
The practical controls are available and not especially complicated. Choose models deliberately: a smaller model for classification, routing, and routine summaries; a heavier one only for complex reasoning or high-stakes outputs. Set output-length limits so the model does not produce lengthy responses when a shorter one would do. Use the billing dashboards that providers like OpenAI and Anthropic publish, and track cost per task rather than total monthly spend. The metric that matters is cost per customer query or cost per drafted proposal, not cost per seat.
What UK regulation adds to the cost picture
Regulation has a quiet but real effect on what AI costs you. UK data protection law, sector-specific rules, and the EU AI Act all shape which data you can process through an AI service and how you must architect that processing. In practice, those constraints affect which models you can use, how much you need to engineer around them, and what safeguards you have to pay for.
The ICO’s guidance on generative AI requires businesses to carry out a Data Protection Impact Assessment before processing personal data through an AI service, to minimise the data they send, and to ensure overseas transfers have proper safeguards in place. That last point matters because the major AI APIs are hosted in the US. For some businesses, the result is a move towards on-premise or virtual private cloud deployments, which carry a different cost structure to public API access.
FCA-regulated firms and solicitors face additional layers. The FCA has confirmed that AI tools fall within existing operational resilience and outsourcing frameworks, while the Solicitors Regulation Authority has published guidance that client-confidential and privileged material requires particular care when generative AI is involved. That typically means carefully engineered summaries rather than feeding whole client files into a cheap third-party API: design effort and fixed costs go up even where raw token volume comes down.
UK businesses that serve clients in the EU should note that the EU AI Act, formally adopted in 2024, places transparency and record-keeping obligations on providers and users of general-purpose AI models in certain contexts. Depending on your use case, compliance may require logging model inputs and outputs, which adds to storage costs and can limit which API services are available to you.
The pricing structure for AI services rewards businesses that take the time to understand it. Know your token unit from your seat licence, choose your model tier deliberately, watch for workflow designs that multiply calls invisibly, and build your regulatory constraints into the architecture before you scale rather than after. Those four disciplines together are what keep AI spending rational.



