Why AI token prices rise and what drives the cost

A business owner at a desk reviewing costs on a laptop in a small office
TL;DR

AI services charge per token, roughly three to four characters of text, for both what you send and what the model generates in response. Costs are rising not because headline prices have gone up but because usage patterns are changing faster: heavier models, longer prompts, and agentic workflows that call AI repeatedly. Knowing which model tier fits each task, and keeping context lean, are the main levers an owner-managed business has over its AI spend.

Key takeaways

- Tokens are the billing unit for AI services, roughly three to four characters each. You are billed separately for what you send in (the prompt) and what the model generates back (the response). - The gap between model tiers is large. A top-end frontier model can cost ten times more per million tokens than a mid-tier model from the same provider, often for the same quality of result on routine tasks. - Effective AI spend can double or more even when per-token headline prices fall. Usage patterns change faster: heavier models, longer conversations, and agentic workflows that call AI in loops all multiply the actual bill. - For owner-managed businesses on per-seat licences such as Microsoft Copilot or ChatGPT Team, token costs are largely invisible until you move to API-direct integrations or high-volume document workflows. - UK data protection rules, FCA and SRA guidance, and the EU AI Act all affect which AI architectures you can use and what you must log, changing both the tools available and the true cost of deployment.

You are looking at last month’s software bills and the AI line has gone up again. Nothing obvious has changed: same tools, roughly the same workload, same team size. But somewhere in the detail there is a reference to token usage, and nobody in the business can give you a clear answer about what that means or what, if anything, to do about it.

This situation is common for owner-managers right now. AI pricing has a clear logic once you understand it, but the people selling these tools rarely slow down to explain it. Here is what actually drives the cost.

What is a token, and what are you actually paying for?

AI billing works on tokens rather than questions or minutes of use. A token is roughly three to four characters of text, so a short sentence is around 20 tokens and a page of text is 500 to 700. You pay separately for what you send in and what the model generates back. Both sides of the exchange carry a price.

The gap between models matters more than many buyers realise. OpenAI and Anthropic both publish their per-million-token pricing across their ranges, and the difference between a top-end frontier model and a mid-tier or mini model from the same provider can exceed ten to one. Choosing which AI service to use is also, quietly, choosing a price point per unit of work. As usage grows, that choice compounds quickly.

Many niche AI tools sell access with a monthly credit or word allowance rather than a raw token count. Those credits map back to tokens at a fixed rate behind the scenes, so the unit economics are real even when the pricing page does not show them.

Why are costs rising even when headline prices are falling?

The headline per-token price for many AI services has been falling, which sounds reassuring. But effective spend for many businesses has gone up, and the gap between those two facts is where the confusion lives. Usage patterns are changing faster than prices are dropping: teams are reaching for heavier models, sending longer prompts, and building workflows that call AI repeatedly rather than once.

Running modern AI models demands specialised hardware and power-hungry data centres. UK data centre power demand is projected to more than double between 2023 and 2030, driven largely by AI workloads, and the National Grid has flagged the pressure this places on electricity networks in London and the South East. Those infrastructure costs feed into what providers charge.

Usage behaviour is where owner-managed businesses feel it most directly. Anthropic published guidance in April 2025 showing that average enterprise spend on Claude Code had roughly doubled, with no change in pricing, because developers were using heavier Opus models in real deployments. MindStudio’s analysis of real-world deployments shows that a tenfold increase in users can produce a fifteenfold increase in token costs, because conversations get longer, context windows grow, and additional AI-powered features get switched on. The relationship between users and costs is not linear, and that is what catches businesses off guard.

Where will you actually meet this in your business?

If your team is using Microsoft Copilot or ChatGPT Team, token costs are largely absorbed into the per-seat licence fee. The provider bundles average expected usage into the subscription price, so you pay a predictable monthly amount regardless of how many prompts your team runs. Token pricing becomes directly visible once you move into territory where you are managing the AI yourself.

The three pricing shapes you will encounter are pay-as-you-go per token, used by OpenAI, Anthropic, and Google for API access; per-user monthly licences, where Microsoft Copilot for Microsoft 365 is priced at £24.70 per user per month in the UK as of early 2025; and feature-based SaaS tools that sell a monthly credit or word allowance which maps back to tokens but keeps the unit economics out of view.

The situation where costs bite hardest is when you integrate an API into your own systems: customer support automation, bulk document drafting, or searching large document libraries. That last scenario, often called retrieval-augmented generation or RAG, can add thousands of context tokens to every single query, because the system pulls in relevant documents and includes them in each prompt before the model processes your question. A workflow that costs little at ten queries a day can become significant at five hundred.

What actually drives your bill up, and what can you control?

Two levers control the bulk of what you pay: which model you choose and how much text you send it. Use a top-end frontier model when a mid-tier version would do the same job, and you could be paying ten times more per task. Send a 20-page contract as context when a two-page summary would serve, and you have multiplied the input cost tenfold.

A third factor is workflow design. Teams moving from simple question-and-answer to agentic workflows, where the AI plans, retrieves information, drafts, checks, and revises in a loop, often see token consumption increase sharply without anyone noticing until the bill arrives. Switching on background features like auto-translation or sentiment analysis adds hidden model calls that accumulate quietly.

The practical controls are available and not especially complicated. Choose models deliberately: a smaller model for classification, routing, and routine summaries; a heavier one only for complex reasoning or high-stakes outputs. Set output-length limits so the model does not produce lengthy responses when a shorter one would do. Use the billing dashboards that providers like OpenAI and Anthropic publish, and track cost per task rather than total monthly spend. The metric that matters is cost per customer query or cost per drafted proposal, not cost per seat.

What UK regulation adds to the cost picture

Regulation has a quiet but real effect on what AI costs you. UK data protection law, sector-specific rules, and the EU AI Act all shape which data you can process through an AI service and how you must architect that processing. In practice, those constraints affect which models you can use, how much you need to engineer around them, and what safeguards you have to pay for.

The ICO’s guidance on generative AI requires businesses to carry out a Data Protection Impact Assessment before processing personal data through an AI service, to minimise the data they send, and to ensure overseas transfers have proper safeguards in place. That last point matters because the major AI APIs are hosted in the US. For some businesses, the result is a move towards on-premise or virtual private cloud deployments, which carry a different cost structure to public API access.

FCA-regulated firms and solicitors face additional layers. The FCA has confirmed that AI tools fall within existing operational resilience and outsourcing frameworks, while the Solicitors Regulation Authority has published guidance that client-confidential and privileged material requires particular care when generative AI is involved. That typically means carefully engineered summaries rather than feeding whole client files into a cheap third-party API: design effort and fixed costs go up even where raw token volume comes down.

UK businesses that serve clients in the EU should note that the EU AI Act, formally adopted in 2024, places transparency and record-keeping obligations on providers and users of general-purpose AI models in certain contexts. Depending on your use case, compliance may require logging model inputs and outputs, which adds to storage costs and can limit which API services are available to you.

The pricing structure for AI services rewards businesses that take the time to understand it. Know your token unit from your seat licence, choose your model tier deliberately, watch for workflow designs that multiply calls invisibly, and build your regulatory constraints into the architecture before you scale rather than after. Those four disciplines together are what keep AI spending rational.

Sources

- OpenAI (2025). Pricing for GPT models. Publishes per-token rates for input and output across model tiers, illustrating the price gap between frontier and mid-tier models. https://openai.com/pricing - Anthropic (2025). Pricing for Claude API. Model-specific token rates showing the cost differential between frontier and mid-tier models. https://www.anthropic.com/pricing - OpenAI (2025). Text generation: tokenisation. Explains how text is split into tokens and why token counts vary by content and formatting. https://platform.openai.com/docs/guides/text-generation/tokens - National Grid ESO (2023). Future Energy Scenarios 2023. Projects UK data centre power demand more than doubling between 2023 and 2030 due to AI workloads, with specific pressure on London and South East electricity networks. https://www.nationalgrideso.com/document/283271/download - ICO (2024). Generative AI and data protection guidance. Sets out ICO expectations including DPIA requirements and data minimisation when personal data is processed by generative AI services. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/generative-ai/ - FCA (2022). Artificial intelligence and machine learning: discussion paper DP5/22. Confirms that AI tools fall within existing operational resilience and outsourcing frameworks for FCA-regulated firms. https://www.fca.org.uk/publications/discussion-papers/dp5-22-artificial-intelligence-and-machine-learning - Solicitors Regulation Authority (2024). Guidance on the use of artificial intelligence. Notes that client-confidential and privileged material requires particular care when generative AI tools are used in legal practice. https://www.sra.org.uk/solicitors/guidance/ethics-guidance/artificial-intelligence/ - MindStudio (2025). What is token-based pricing for AI models? Documents real-world cases where a tenfold increase in users produced a fifteenfold increase in token costs due to longer conversations, larger context windows, and additional AI-powered features. https://www.mindstudio.ai/blog/token-based-pricing/ - European Parliament (2024). Regulation (EU) 2024/1689 on artificial intelligence (EU AI Act). Sets out transparency and record-keeping obligations for providers and certain users of AI systems, with staggered application dates from 2024 to 2026. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689

Frequently asked questions

What is a token in AI pricing?

A token is roughly three to four characters of text. AI providers use tokens as their billing unit rather than words, questions, or minutes. A short email prompt is around 100 tokens; a page of text is roughly 500 to 700. You are billed separately for the tokens you send in and the tokens the model generates back, and rates differ considerably between model tiers.

Why has my AI spend gone up when the pricing looks lower?

Headline per-token prices have fallen for many AI services, but effective costs have risen for many businesses because usage patterns are changing. Teams are reaching for heavier frontier models rather than cheaper mid-tier options, sending longer context windows, and building agentic workflows that call AI repeatedly in a single task. Anthropic's April 2025 guidance update indicated that average enterprise spend had roughly doubled with no change in pricing, simply because users were running heavier models more frequently.

Does UK regulation affect what I pay for AI?

Yes, indirectly. The ICO requires a Data Protection Impact Assessment before you process personal data through a generative AI service, and the FCA expects regulated firms to treat AI tools within their operational resilience frameworks. The SRA has issued guidance that client-confidential and privileged material requires particular care. These constraints can rule out cheap API-direct approaches and require more engineered architectures, adding fixed costs even where they reduce raw token volume.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation