Small language models: a plain-English guide

A vendor in a product demo tells you their AI tool runs on a “small language model fine-tuned for professional services.” It sounds credible. You nod and move on. Later, you wonder whether that was a meaningful technical point or just reassuring language. This guide explains what small language models actually are, where you are likely to encounter one, and how to work out whether one suits a specific job in your business.

What is a small language model?

A small language model is an AI system trained for one narrow category of language task, such as drafting email replies, classifying support tickets, or summarising call recordings. The “small” describes its scale relative to large general-purpose models like those behind ChatGPT. Oracle puts the size difference at 100 to 1,000 times smaller in many cases, and notes that many SLMs can run on a local device, offline, without a cloud connection.

The scale difference matters in a practical sense. A large language model might have hundreds of billions of parameters, trained across a vast and varied dataset. An SLM might have one to seven billion parameters, trained on a narrower dataset and optimised to produce a specific type of output. That narrowness is both the limitation and the appeal, and the model is cheaper to run precisely because it is doing less.

When a vendor describes their product as running on a small language model, they are usually signalling one of three things. The model may operate locally rather than in the cloud. It may have been trained on your industry or use case rather than on general internet text. Or it may cost less to operate than a large general-purpose alternative. All three are worth probing in follow-up questions.

Why does it matter for a smaller business?

The main appeal for a smaller business is cost and control. A narrower model does fewer things but does them more cheaply and with less data flowing to external servers. Machine Learning Mastery cites production use cases where a model fine-tuned on a single task costs around 95% less to run than a comparable large cloud model. For a firm handling customer enquiries or document processing at volume, that gap is worth understanding.

For service businesses, the practical implications sit in three areas. On operating cost, if an AI tool runs thousands of queries a month, a local SLM can cut spend significantly compared with paying per API call to a large cloud model. On data residency, if the model runs on your own server or a local device, your customers’ data stays within your environment, which matters for UK GDPR obligations. On latency, a local model responds faster than one routing queries to a remote server, which affects real-time customer interactions.

Those advantages only hold when you are comparing like for like. An SLM is cheaper than a large model for the specific task it was trained to handle. Push it outside that task and quality drops quickly.

Where will you actually run into one?

Many of the places where you’ll encounter small language models are in tools you may already be trialling. Customer support platforms that suggest draft replies, email tools that classify or prioritise incoming messages, document systems that extract structured data from forms, and meeting tools that summarise recordings often use SLMs under the hood. You may already be running one without using that term.

Beyond packaged software, you’ll also encounter the concept in vendor pitches and procurement conversations. A supplier saying “we use a small language model” is signalling something specific about how their product is built, and it’s worth probing what that means for your data, your integration options, and how the model gets updated when your business needs change.

Common areas in UK services businesses where SLMs appear include customer support, where the model suggests or generates first responses to routine queries; email triage, where incoming messages are classified by type or urgency; document processing, where forms or contracts are parsed to extract key fields; and internal knowledge tools, where staff questions are matched to relevant articles or procedures.

When is an SLM worth asking about, and when should you ignore it?

An SLM is worth exploring when you have one clearly defined workflow with a measurable output. Good examples include classifying inbound emails by type, generating first-draft replies to routine enquiries, or pulling structured data from standard documents. The British Business Bank found that 25% of UK smaller businesses were using AI at all in 2024, suggesting many firms are still at the “pick one workflow” stage rather than the “choose between model types” stage.

There are also clear situations where an SLM is the wrong starting point. If you cannot define what a good output looks like, the model won’t tell you. If the task requires reasoning across many different topics, broad world knowledge, or frequent exception handling, a narrower model will hit its ceiling quickly. Thoughtworks describes SLMs as suited to targeted, domain-specific tasks rather than general-purpose assistant roles. If you need the latter, a large model is more appropriate.

Two further scenarios make model choice secondary. If your bigger problem is a poorly defined or broken process, a model won’t fix the underlying chaos. And if you cannot monitor outputs regularly, even a focused model will produce errors at volume and those errors compound without oversight.

If the task touches personal data, regulated decisions, or anything with a legal or financial consequence for customers, the question shifts from model type to governance. The ICO is clear that organisations must have a lawful basis and ensure transparency when using AI with personal data. The FCA expects firms to remain accountable for AI-driven outcomes regardless of whether the model is large or small, cloud-hosted or local.

Small language models appear in vendor conversations alongside a handful of other terms worth understanding. Fine-tuning, retrieval-augmented generation, and on-device inference each describe a different aspect of how a model is built, deployed, or made more accurate. Knowing what these mean helps you understand what a supplier is actually offering and whether the approach suits your use case.

Fine-tuning means taking an existing model and training it further on a specific dataset, so it performs better on a narrow set of outputs. A vendor who says their SLM has been fine-tuned on insurance policy language has taken a base model and trained it on that domain. The output should outperform a general model on that specific task, though the improvement depends heavily on the quality and volume of training data used.

Retrieval-augmented generation, often shortened to RAG, means the model doesn’t rely solely on what it learned during training. When it receives a query, it searches a connected knowledge base, a product catalogue, or a policy library for relevant context before generating its response. This can improve accuracy significantly for tasks where current or firm-specific information matters.

On-device inference means the model runs on a local device or on-premises server rather than sending queries to a remote cloud. The EU AI Act, which entered into force on 1 August 2024, creates risk-based obligations regardless of model size or hosting location. If your firm’s AI outputs affect EU-based users, the regulatory framework applies whether the model is cloud-hosted or local. The NCSC also recommends treating AI suppliers and their connected systems as part of your organisation’s attack surface, which holds however the model is deployed.

A plain-English guide to small language models

Key takeaways

What is a small language model?

Why does it matter for a smaller business?

Where will you actually run into one?

When is an SLM worth asking about, and when should you ignore it?

Sources

Frequently asked questions

What is the difference between a small language model and ChatGPT?

Do I need technical knowledge to use a small language model in my business?

Are small language models regulated in the UK?

Ready to talk it through?

If any of this sounds familiar, let's talk.

A plain-English guide to small language models

Key takeaways

What is a small language model?

Why does it matter for a smaller business?

Where will you actually run into one?

When is an SLM worth asking about, and when should you ignore it?

What related AI terms come up alongside small language models?

Sources

Frequently asked questions

What is the difference between a small language model and ChatGPT?

Do I need technical knowledge to use a small language model in my business?

Are small language models regulated in the UK?

Ready to talk it through?

Related reading

How much AI does a founder actually need to understand?

Why data provenance matters for AI training sets and trust

What people mean by AI origin and source tracking

If any of this sounds familiar, let's talk.