A plain-English guide to small language models

A business owner reviewing a laptop screen with handwritten notes on the desk beside them
TL;DR

Small language models are AI systems trained for one specific task rather than general conversation. They cost less to run than large models, can often be deployed on local devices, and suit narrow, repeatable business workflows such as classifying emails, summarising documents, or drafting routine replies. UK regulatory obligations apply based on what the AI does, not on the size of the underlying model.

Key takeaways

- A small language model is an AI system trained for one narrow task, such as classifying emails or summarising documents, rather than general-purpose conversation. - SLMs can be 100 to 1,000 times smaller than large general-purpose models, making them cheaper to run and easier to keep on-premises. - Common business uses include customer support automation, email classification, document summarisation, and knowledge-base article suggestions. - The right question before adopting an SLM is whether your task is narrow enough to have a clear, measurable success condition. - UK regulatory obligations around AI apply based on what the AI does and whose data it processes, not on whether the underlying model is large or small.

A vendor in a product demo tells you their AI tool runs on a “small language model fine-tuned for professional services.” It sounds credible. You nod and move on. Later, you wonder whether that was a meaningful technical point or just reassuring language. This guide explains what small language models actually are, where you are likely to encounter one, and how to work out whether one suits a specific job in your business.

What is a small language model?

A small language model is an AI system trained for one narrow category of language task, such as drafting email replies, classifying support tickets, or summarising call recordings. The “small” describes its scale relative to large general-purpose models like those behind ChatGPT. Oracle puts the size difference at 100 to 1,000 times smaller in many cases, and notes that many SLMs can run on a local device, offline, without a cloud connection.

The scale difference matters in a practical sense. A large language model might have hundreds of billions of parameters, trained across a vast and varied dataset. An SLM might have one to seven billion parameters, trained on a narrower dataset and optimised to produce a specific type of output. That narrowness is both the limitation and the appeal: the model is cheaper to run precisely because it is doing less.

When a vendor describes their product as running on a small language model, they are usually signalling one of three things: the model operates locally rather than in the cloud, it was trained on your industry or use case rather than on general internet text, or it costs less to operate than a large general-purpose alternative. All three are worth probing in follow-up questions.

Why does it matter for a smaller business?

The main appeal for a smaller business is cost and control. A narrower model does fewer things but does them more cheaply and with less data flowing to external servers. Machine Learning Mastery cites production use cases where a model fine-tuned on a single task costs around 95% less to run than a comparable large cloud model. For a firm handling customer enquiries or document processing at volume, that gap is worth understanding.

For service businesses, the practical implications sit in three areas. Operating cost: if an AI tool runs thousands of queries a month, a local SLM can cut spend significantly compared with paying per API call to a large cloud model. Data residency: if the model runs on your own server or a local device, your customers’ data stays within your environment, which matters for UK GDPR obligations. Latency: a local model responds faster than one routing queries to a remote server, which affects real-time customer interactions.

Those advantages only hold when you are comparing like for like. An SLM is cheaper than a large model for the specific task it was trained to handle. Push it outside that task and quality drops quickly.

Where will you actually run into one?

Many of the places where you’ll encounter small language models are in tools you may already be trialling. Customer support platforms that suggest draft replies, email tools that classify or prioritise incoming messages, document systems that extract structured data from forms, and meeting tools that summarise recordings often use SLMs under the hood. You may already be running one without using that term.

Beyond packaged software, you’ll also encounter the concept in vendor pitches and procurement conversations. A supplier saying “we use a small language model” is signalling something specific about how their product is built, and it’s worth probing what that means for your data, your integration options, and how the model gets updated when your business needs change.

Common areas in UK services businesses where SLMs appear include customer support, where the model suggests or generates first responses to routine queries; email triage, where incoming messages are classified by type or urgency; document processing, where forms or contracts are parsed to extract key fields; and internal knowledge tools, where staff questions are matched to relevant articles or procedures.

When is an SLM worth asking about, and when should you ignore it?

An SLM is worth exploring when you have one clearly defined workflow with a measurable output. Good examples include classifying inbound emails by type, generating first-draft replies to routine enquiries, or pulling structured data from standard documents. The British Business Bank found that 25% of UK smaller businesses were using AI at all in 2024, suggesting many firms are still at the “pick one workflow” stage rather than the “choose between model types” stage.

There are also clear situations where an SLM is the wrong starting point. If you cannot define what a good output looks like, the model won’t tell you. If the task requires reasoning across many different topics, broad world knowledge, or frequent exception handling, a narrower model will hit its ceiling quickly. Thoughtworks describes SLMs as suited to targeted, domain-specific tasks rather than general-purpose assistant roles. If you need the latter, a large model is more appropriate.

Two further scenarios where model choice becomes secondary: when your bigger problem is a poorly defined or broken process, since a model won’t fix the underlying chaos; and when you cannot monitor outputs regularly, since even a focused model will produce errors at volume and those errors compound without oversight.

If the task touches personal data, regulated decisions, or anything with a legal or financial consequence for customers, the question shifts from model type to governance. The ICO is clear that organisations must have a lawful basis and ensure transparency when using AI with personal data. The FCA expects firms to remain accountable for AI-driven outcomes regardless of whether the model is large or small, cloud-hosted or local.

Small language models appear in vendor conversations alongside a handful of other terms worth understanding. Fine-tuning, retrieval-augmented generation, and on-device inference each describe a different aspect of how a model is built, deployed, or made more accurate. Knowing what these mean helps you understand what a supplier is actually offering and whether the approach suits your use case.

Fine-tuning means taking an existing model and training it further on a specific dataset, so it performs better on a narrow set of outputs. A vendor who says their SLM has been fine-tuned on insurance policy language has taken a base model and trained it on that domain. The output should outperform a general model on that specific task, though the improvement depends heavily on the quality and volume of training data used.

Retrieval-augmented generation, often shortened to RAG, means the model doesn’t rely solely on what it learned during training. When it receives a query, it searches a connected knowledge base, a product catalogue, or a policy library for relevant context before generating its response. This can improve accuracy significantly for tasks where current or firm-specific information matters.

On-device inference means the model runs on a local device or on-premises server rather than sending queries to a remote cloud. The EU AI Act, which entered into force on 1 August 2024, creates risk-based obligations regardless of model size or hosting location. If your firm’s AI outputs affect EU-based users, the regulatory framework applies whether the model is cloud-hosted or local. The NCSC also recommends treating AI suppliers and their connected systems as part of your organisation’s attack surface, which holds however the model is deployed.

Sources

- Oracle (2024). Small language models. Describes SLM scale relative to large models, local device deployment capability, and typical use cases in customer support and content personalisation. https://www.oracle.com/uk/artificial-intelligence/small-language-models/ - Machine Learning Mastery (2024). Introduction to small language models: the complete guide for 2026. Documents production cost comparisons between fine-tuned narrow models and large cloud models, and criteria for when SLMs outperform general models on specific tasks. https://machinelearningmastery.com/introduction-to-small-language-models-the-complete-guide-for-2026/ - Thoughtworks (2024). Small language models. Guidance on where SLMs suit targeted, domain-specific tasks such as email drafting, versus general-purpose assistant deployments. https://www.thoughtworks.com/en-gb/insights/decoder/s/small-language-models - Kili Technology (2024). A guide to using small language models. Overview of typical SLM use cases in services businesses, including ticketing, summarisation, and knowledge-base interfaces. https://kili-technology.com/blog/a-guide-to-using-small-language-models - ICO (2023). Artificial intelligence and data protection. UK regulator guidance on lawful basis, transparency, fairness, data minimisation, and DPIA requirements when using AI with personal data. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - FCA (2024). Artificial intelligence. FCA position that firms remain responsible for AI-driven outcomes and must manage model governance, data quality, and explainability. https://www.fca.org.uk/innovation/ai - British Business Bank (2024). Artificial intelligence adoption by smaller businesses. Reports that 25% of UK smaller businesses and 44% of medium-sized businesses were using AI in 2024. https://www.british-business-bank.co.uk/blog/artificial-intelligence-adoption-smaller-businesses-2024 - European Commission (2024). EU Artificial Intelligence Act, Regulation 2024/1689. Risk-based AI obligations applying from 1 August 2024; relevant for UK firms whose services or AI outputs touch EU users. https://eur-lex.europa.eu/eli/reg/2024/1689/oj - NCSC (2024). AI and cyber security. Guidance on treating AI suppliers and connected systems as part of an organisation's attack surface, applicable regardless of model size or hosting arrangement. https://www.ncsc.gov.uk/guidance/ai-and-cyber-security - UK Government (2023). AI regulation: a pro-innovation approach. Sets out the UK's risk-based, sector-led approach to AI governance underpinning ICO and FCA obligations for services businesses. https://www.gov.uk/government/publications/ai-regulation-a-pro-innovation-approach/white-paper

Frequently asked questions

What is the difference between a small language model and ChatGPT?

ChatGPT runs on a large general-purpose model built to handle a wide range of questions and tasks. A small language model is narrower, trained for one category of task such as classifying emails or summarising documents. SLMs tend to cost less to run and are often deployed on local devices or servers, but they cannot switch between unrelated tasks the way a large model can.

Do I need technical knowledge to use a small language model in my business?

That depends on how you use it. If a software provider has already built an SLM into their product, you need no technical knowledge to use that product. If you want to deploy a model on your own infrastructure or fine-tune it on your own data, you will need a developer or a managed service. Many owner-managed firms start with the first route and move to the second only when their needs outgrow the packaged product.

Are small language models regulated in the UK?

Regulatory obligations in the UK depend on what the AI does and whose data it processes, not on the size of the model. The ICO requires a lawful basis, transparency, and a Data Protection Impact Assessment where appropriate when AI handles personal data. If your business is in a regulated sector such as financial services, the FCA expects governance, explainability, and human oversight for any AI affecting customer outcomes.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation