The invoice came through on a Tuesday afternoon. An owner of a professional services firm in Leeds had been using an AI writing tool for six weeks, mostly for drafting client reports and proposal outlines. She had estimated the monthly cost at around £30, based on a rough calculation her developer had done at the start. The actual figure was four times that. The line item said “API usage, 1.2 million tokens.” The word “tokens” appeared nowhere in the original briefing.
The gap between estimate and reality almost always comes down to the same thing. The cost of an AI response is shaped more by what the model sends back than by what you send in.
What is an output token?
When you send a request to an AI model, the system converts your text into small units called tokens. Your request makes up the input tokens. Every word of the model’s response is an output token. Both sides are metered and billed separately by API providers, and the two rates are not equal. Understanding this split is the foundation of understanding your AI costs.
A token is roughly three to four characters of English text: part of a word, a punctuation mark, or a short common word on its own. A 200-word document converts to around 250 to 280 tokens. Your instructions, any documents you paste in, and previous messages in the thread all count as input tokens. The full text of the reply is output. MindStudio’s documentation illustrates this with a worked example: a 500-token prompt plus a 200-token response in GPT-4, at $0.01 per thousand input tokens and $0.03 per thousand output tokens, comes to $0.011 for that single exchange. Scaled across hundreds of daily requests, those fractions accumulate quickly.
Why does the output side cost more than the input?
Generating each output token requires more computation than reading an input token. For every word it produces, the model runs a full forward pass through its parameters. That extra work is reflected directly in the price. Typical API pricing in early 2026 put input tokens at $0.15 to $5.00 per million, and output tokens at $0.60 to $25.00 per million, a three-to-five times differential that holds across budget, mid-range, and premium model tiers.
The practical consequence for owner-managed businesses is straightforward. Drafting a client report, writing a proposal, or summarising a set of meeting notes involves a short input and a long reply. In those workloads, the output side accounts for the majority of the cost on every call, regardless of how concisely the prompt is written. The differential is consistent across providers, built into the economics of how these APIs are structured. A two-word change to your prompt is unlikely to shift the bill significantly. A decision to cap response length at 150 words rather than letting the model run free almost certainly will.
Where do output tokens actually add up for a service firm?
Content creation is where output token costs land hardest for service businesses. A prompt asking for a proposal draft might run to 200 tokens. The resulting draft might run to 800 or 1,000 tokens. Run that pattern across a team producing several documents a day and the monthly total climbs quickly, often to two or four times whatever estimate was put together at the start.
There are also sources that are less obvious. Conversation history is resent in full with every new message in many AI tools, growing with each exchange. System instructions, which configure how the AI behaves, add to every request without appearing in what you type. Tool definitions, which describe functions the model can call, contribute another layer. Practitioner guides and provider documentation suggest these hidden inputs add 20 to 40 per cent to the actual token count in production use, without appearing anywhere in the usage report you see.
The CMA’s work on AI foundation models reinforces the importance of understanding pricing structures before you commit. Token-based billing sounds transparent, but different providers use different tokenisers. The same block of text can produce different token counts in OpenAI’s system, Anthropic’s system, and Google’s system. A headline price per million tokens is therefore not a direct comparison across providers without running a test on your own content.
When does the output token split not apply to your costs?
If your AI tools are priced as a flat monthly subscription rather than on usage, the token economics are absorbed by the vendor and do not appear as a separate cost for you. Short classification tasks, such as labelling a customer query or running a sentiment check on an email, produce very few output tokens, so the output differential has a negligible effect on your bill.
Embedding-only workflows, where you use AI to build a searchable index from a document library, often have no output text at all. The pricing in those cases is input-only. If you have an AI feature that returns a single word or a short label rather than a paragraph, the same logic applies: input tokens will account for most of the cost. The question to ask before worrying about output optimisation is simple: does the model produce substantial amounts of text in this use case? If yes, the output token differential matters. If not, it is an internal issue for the vendor, not a line item on your invoice.
What else do you need to understand to manage token costs?
The most effective starting move is to set explicit output length caps in every production prompt. Instructing the model to reply in under 150 words, or to produce a bullet list capped at six items, directly reduces the token count on every call. Pair this with a request to your vendor or technical contact for a per-workflow breakdown of input and output tokens.
Model selection is the second lever. Every AI provider offers model tiers at significantly different price points, from under a dollar per million tokens to $75 or more. A premium model is appropriate for complex reasoning tasks, but running routine drafting or summarisation through that same tier is expensive by default. Routing simpler tasks to a more affordable model can cut per-request costs substantially without a change in output quality your team would notice. Combining prompt length controls, history summarisation, and model routing can reduce token spend by 30 to 70 per cent on like-for-like tasks, according to practitioners including 10Clouds.
Managing conversation history is the third practical lever. Rather than allowing AI tools to resend full conversation threads with every new message, configure the tool or instruct the model to work from a short summary of earlier exchanges. This applies to any tool that supports multi-turn conversations.
There are also regulatory dimensions worth registering alongside the cost ones. The ICO’s guidance on AI and data protection notes that outputs containing personal data, such as AI-drafted HR notes or client-specific reports, fall under UK GDPR’s data minimisation principle. Keeping outputs to what is genuinely needed serves both your cost controls and your data handling obligations. The NCSC’s guidance on using public generative AI safely makes a related point: managing what the model sends back is part of managing your organisation’s exposure, not just its invoice. For regulated firms, the FCA’s guidance on outsourcing and third-party risk management is also relevant: cost spikes from uncontrolled AI output are an operational risk, not simply a budget inconvenience.
If you want a concrete starting point, ask your technical contact or vendor to pull the average input and output token count for each workflow you run. That single number will tell you more about where your AI spend is going than any amount of headline pricing comparison.



