How model routers choose where each AI request goes

Founder reviewing a laptop screen at a desk in a sunlit office
TL;DR

An AI model router is software that directs each AI request to the most appropriate model based on complexity, cost, and speed, instead of sending every task to a single expensive model. It matters most for firms with high query volume and varied task types, such as a customer-facing chatbot. For small firms with modest internal AI use, a single well-chosen model is usually the simpler and safer starting point.

Key takeaways

- An AI model router sits between your application and multiple AI models, picking the right one for each request based on complexity and cost. - Routing can reduce API costs significantly by sending simple queries to cheap models and escalating only when complexity warrants it, though published savings figures come from vendor sources rather than independent benchmarks. - Azure and AWS Bedrock both offer routing as a managed feature, with configurable modes for cost-priority, quality-priority, or balanced routing. - Routing suits high-volume, varied-query applications; for small firms with modest internal AI use, a single-model setup is typically the simpler and safer starting point. - UK firms in regulated sectors need to ensure every endpoint in a routing chain meets UK GDPR and FCA oversight requirements, with clear data-flow documentation for each provider.

A consultant recommends setting up a “model router” for your new customer chat system. You nod. You write it down. Later, alone, you search for what it actually means, and many explanations assume you already know what an API call costs, why you’d run more than one model, and what throughput means in practice. This post assumes none of that. It explains what a model router is, how it works, and when a firm like yours genuinely needs one.

What is an AI model router?

An AI model router is software that sits between your application and two or more AI models, directing each incoming request to whichever model makes sense for that task. A short, factual question might go to a small, cheap model. A long document requiring careful reasoning goes to a more capable one. The router makes that choice automatically, in milliseconds.

The analogy Microsoft uses is a smart switchboard. Instead of one phone line to “the AI”, you have several, and the switchboard decides in real time which line handles each call. Azure’s model router works this way for OpenAI’s GPT family, automatically distributing requests based on how complex each prompt appears to be.

Amazon’s Bedrock platform offers a similar pattern with Anthropic’s models: a router predicts which model in a pool will give the best outcome for cost and quality, then routes accordingly. The user and the application see a single endpoint. The routing happens behind the scenes.

The economics behind this matter. Public pricing on AI APIs spans a roughly 300-fold range, from around US$0.10 per million tokens at the cheap end to US$30 or more at the high end. A router that sends straightforward requests to cheap models and complex ones to expensive models can, in principle, keep costs down without degrading quality for the tasks that genuinely need it. MindStudio, a commercial routing provider, reports organisations seeing 30 to 70 per cent cost reductions using this approach, though those figures come from vendor marketing materials rather than independent benchmarks.

How does a router decide where each request goes?

The router analyses the prompt, estimates how complex it is, then picks a model from a pre-configured pool. Azure’s router considers the full request including conversation history, then scores it for likely difficulty. AWS describes two routing strategies: static rules that assign request types to fixed models, and dynamic routing that uses machine learning to predict which model will perform best.

Azure exposes three routing modes. “Balanced” aims for the best overall cost-quality mix. “Cost” aggressively prefers cheaper models and only escalates when the prompt seems too hard for them. “Quality” pushes everything to the highest-capability models regardless of price. You configure the mode once, and the router applies it to every request.

After selecting a model, the router forwards the request and returns the response. Microsoft’s documentation notes that the router itself adds only a negligible fraction to the total processing time.

Routers also handle failure gracefully. If a model is unavailable, hits its rate limit, or returns a low-confidence response, a well-built router will retry on a different model from the pool rather than simply failing. Azure shows the model name it used in each API response, so you can audit which model actually served which request. That audit trail matters for regulated firms.

Where will you actually meet model routing?

For a small UK services firm, you will most likely encounter model routing through the managed cloud platforms you already use or are evaluating, rather than by building your own routing layer. Azure, AWS Bedrock, and commercial platforms like MindStudio all offer routing as a built-in feature. You configure it; you don’t engineer it from the ground up.

The most common scenario for a firm of five to fifty people is a customer-facing chatbot with a mix of query types. Appointment booking, FAQ responses, and simple status enquiries can go to cheaper, faster models. More complex queries, anything requiring detailed reasoning, legal interpretation, or nuanced advice drafting, go to a more capable model. The routing is invisible to the customer; it happens inside the platform.

A second scenario is internal tooling. If your staff use AI for a mix of tasks, quick email drafts, summarising documents, writing first-pass reports, a router can ensure the cheap model handles the quick tasks and the expensive model handles the ones that genuinely need it.

You also meet routing indirectly when cloud providers manage it for you by default. Azure’s router is presented as a first option for general-purpose OpenAI workloads. You don’t always need to know it is happening.

When does routing make sense, and when should you ignore it?

Model routing is worth considering when you have high query volume, mixed task complexity, and meaningful AI spend. Microsoft specifically recommends it for user-facing applications like customer support chatbots where latency matters and many requests are simple. For a firm with modest internal AI use, a straightforward setup with one well-chosen model is usually more practical.

The business case strengthens as volume grows. If you send only a few dozen AI requests a day, the architectural overhead of a router, another service to configure, monitor, and secure, will cost more in time than it saves in API fees.

Routing also gets complicated in regulated environments. If you are a firm in financial services or working with NHS contracts, adding multiple AI providers means multiple chains to document, monitor, and take responsibility for. The FCA’s expectations on third-party AI are clear: regulated firms remain accountable for outcomes even when using external models. Azure itself recommends direct, single-model deployments for specialised or compliance-sensitive workloads, not routing. Adding routing complexity to those situations increases audit work rather than reducing it.

The NCSC has also noted that routing introduces an additional surface area to secure. Authentication, logging, and rate-limit management all need attention. For a small firm without a technical team or a technical partner, this is a real consideration.

Model routing sits in a wider set of ideas around how organisations deploy and manage AI models at any kind of scale. If someone in a supplier conversation mentions prompt routers, AI gateways, multi-agent architectures, or inference proxies, they are pointing at adjacent patterns with overlapping goals. Understanding the rough shape of each helps you ask better questions rather than defer to the jargon.

A prompt router is the closest relative. Where a model router chooses between AI models, a prompt router may also direct requests to non-AI tools, databases, or code functions. Some platforms use the terms interchangeably.

An AI gateway sits at a different layer. It manages authentication, rate limiting, logging, and cost controls across all AI API calls from your organisation, regardless of which model is used. Think of it as the security and billing layer, with the router sitting inside it or alongside it.

Multi-agent architectures take routing a step further. Instead of choosing between models for a single request, they chain multiple AI calls together, with each agent handling part of a task. Routing logic becomes part of the coordination between agents.

For a small firm at the early stages of AI adoption, the practical value in knowing about these patterns is mainly conversational: you can follow the discussion and ask informed questions when a supplier or consultant raises them.

The terminology around AI infrastructure is expanding quickly, and model routing is one of the more useful concepts to have in your vocabulary before you sit down with a supplier. You don’t need to build it. You may need to configure it. Knowing what it is, and when it helps versus when it adds friction, puts you in a much stronger position than nodding along.

Sources

- Microsoft Azure (2024). How model router works in Microsoft Foundry. Documents how Azure's proprietary router distributes requests across GPT-family models using Balanced, Cost, and Quality modes, and describes the audit trail each API response contains. https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/model-router-how-it-works - AWS Machine Learning Blog (2024). Multi-LLM routing strategies for generative AI applications on AWS. Describes static and dynamic routing strategies and the use of Anthropic's prompt router on Bedrock to select the best model per request. https://aws.amazon.com/blogs/machine-learning/multi-llm-routing-strategies-for-generative-ai-applications-on-aws/ - MindStudio (2024). What is an AI model router? Optimise cost across LLM providers. Reports 30 to 70 per cent cost reductions and describes the roughly 300x pricing range across current LLM APIs. https://www.mindstudio.ai/blog/what-is-ai-model-router-optimize-cost-llm-providers/ - LogRocket (2024). LLM routing in production: choosing the right model for every request. Practitioner guide covering complexity signals, routing architecture patterns, and when routing adds overhead rather than value. https://blog.logrocket.com/llm-routing-right-model-for-requests/ - ICO (2024). Guidance on AI and data protection. Sets out UK GDPR obligations for organisations processing personal data through AI systems, including third-party and multi-provider setups. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ai-and-data-protection/ - FCA (2023). Artificial intelligence and machine learning in financial services. Confirms that regulated firms remain accountable for third-party AI outcomes and must manage concentration and supply-chain risk. https://www.fca.org.uk/news/speeches/artificial-intelligence-and-machine-learning-financial-services - NCSC (2024). Guidance on integrating generative AI into your organisation. Advises organisations to minimise data sent to external models, log AI usage, and understand where data is processed. https://www.ncsc.gov.uk/collection/guidance-on-integrating-generative-ai-into-your-organisation - CMA (2023). Initial review of foundation models. Identifies competition concerns around provider concentration and stresses the value of interoperability in AI infrastructure decisions. https://www.gov.uk/government/publications/ai-foundation-models-initial-report - Official Journal of the EU (2024). Regulation (EU) on Artificial Intelligence (AI Act). Sets risk-based obligations for high-risk AI applications, with relevance to UK firms serving EU markets via routed AI functions. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689

Frequently asked questions

Does using a model router mean I lose control over which AI model handles my data?

You retain control. When using a managed router such as Azure's or AWS Bedrock's, you define which models are in the pool and, in Azure's case, which data zone the requests stay within. The router picks among your approved options, not the full market. You also get a log of which model served each request, which matters for data protection compliance.

How much technical knowledge do I need to set up model routing?

For a managed platform like Azure or AWS Bedrock, you need enough technical understanding to configure the routing mode and the model pool, which is comparable to setting up other cloud services. You don't need to write the routing algorithm yourself. For custom or open-source setups, the complexity increases significantly, and that is where a technical partner or in-house capability becomes necessary.

Is model routing covered by UK AI regulations coming into effect?

Model routing itself is not a regulated category, but the data flows and decisions it enables are. The ICO's guidance on AI and data protection applies whenever you send personal data through any AI processing chain, including routed ones. For high-risk uses, such as automated decisions about individuals, you may need a data protection impact assessment regardless of whether routing is involved. The EU AI Act also applies if you serve markets in the EU.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation