What is a small language model, and when should you use one?

Person reviewing documents at a desk with a laptop open beside them
TL;DR

Small language models are AI systems with between a few million and 10 billion parameters, designed to run on modest hardware and be tuned to a specific task or document set. For a small UK service firm, they are worth considering when you have a narrow, well-defined workflow, good internal data, and a reason to keep that data off a third-party cloud. They are a poor fit for open-ended queries or firms without the technical capacity to manage them.

Key takeaways

- SLMs have between a few million and 10 billion parameters, small enough to run on a laptop or single GPU without large cloud infrastructure costs. - They perform best on narrow, well-defined tasks where you have a clean set of internal documents to train or ground them on. - The main business case for small firms is lower cost per query, tighter control over where data goes, and improved accuracy on specific tasks compared with a general-purpose model. - Major platforms including Microsoft Azure, IBM watsonx, and Salesforce Einstein 1 offer managed SLM options, meaning you can test without managing your own infrastructure. - UK data protection obligations from the ICO apply to SLMs just as they do to any other AI system: you still need a lawful basis, a DPIA for high-risk uses, and clear staff notices.

A founder running a small professional services firm told me recently she’d spent weeks trying to get a standard AI assistant to reliably answer staff questions about the firm’s HR policies. The model kept getting details wrong, mixing up general guidance with her firm’s actual rules. She was also uncomfortable putting sensitive staff documents into a public cloud tool. What she was describing, though she didn’t have a name for it, is the gap that small language models are built to fill.

What is a small language model?

A small language model, or SLM, is an AI model with far fewer parameters than headline systems like GPT-4. Where those run on hundreds of billions, an SLM typically sits between a few million and 10 billion. That smaller scale lets it run on a good laptop or a single GPU, and means it can be focused on one task rather than trained on the whole internet.

The term “small” is relative. IBM describes SLMs as ranging from a few million to a few billion parameters. Hugging Face places the upper boundary at around 10 billion. Microsoft’s Phi-2 model, at 2.7 billion parameters, performs competitively with much larger models on certain benchmarks, according to Microsoft’s own published testing. The major platform vendors describe SLMs as typically trained on narrower, higher-quality datasets for a defined purpose, whether that’s summarising sales calls, answering product-specific questions, or retrieving company policy details.

The practical upshot for a founder: an SLM is a cut-down, specialist AI model you can tune to your own data, potentially running on modest hardware rather than relying entirely on a general-purpose cloud service.

Why should a firm of your size pay attention to SLMs?

For a firm of 5 to 50 people, the interesting thing about SLMs has less to do with the technology and more to do with what they remove. A general-purpose AI subscription sends your data to a third-party cloud and performs best on open-ended tasks. An SLM runs on hardware you control, stays focused on a specific job, and costs far less per query once it’s up and running.

Three practical advantages stack up for a small service firm. First, cost per query: High Digital reports that SLMs in the 1 to 10 billion parameter range can run on consumer-grade GPUs or even standard CPUs, making infrastructure spend a fraction of what large-model deployments require. Second, data control: a model running on your own server means client data and internal documents don’t leave your network. Third, accuracy on narrow tasks: a well-focused SLM can outperform a general model on a specific domain because the training data is curated for that domain, not averaged across the internet.

The World Economic Forum adds that techniques such as quantisation can reduce model size and memory requirements by up to 75% with limited performance impact, making on-device deployment genuinely accessible to firms without specialist hardware.

Where will you actually come across SLMs?

For many service firms, SLMs will appear first through platforms you already use. Microsoft Azure includes its Phi models in its AI catalogue. IBM’s watsonx offers small, task-specific options. Salesforce is building SLM-powered features into its Einstein 1 platform. If you’re not on any of those, the realistic starting point is a managed API that lets you test a small model without managing your own infrastructure.

UK agency High Digital documents several SLM deployments that look genuinely achievable for a firm of this size: an internal helpdesk bot trained on company policies and procedures, a compliance checker reviewing documents against standard criteria, a client-facing Q&A tool embedded in a client portal, and an automated report summariser for client engagements. The common thread is a clean, structured document set and a single well-defined question the model has to answer repeatedly.

All four start with understanding what documents you already have, what question your staff or clients ask most frequently, and whether you have the technical capacity to connect a model to a document store. For many of these use cases, that connection is a straightforward integration, not an engineering project. The hard part is usually curating the documents, not building the model.

When does an SLM make sense, and when should you stay with a standard tool?

SLMs earn their keep when you have a narrow task, a clean body of internal documents, and a genuine reason to want the model running on hardware you control. If you’re answering staff questions about your own HR policies, summarising client meeting notes, or running a FAQ bot trained on your own contracts, a small model can outperform a general-purpose one on accuracy and cost.

Several situations tip the balance towards a standard LLM service instead. If your queries are open-ended and varied, covering anything a client might ask across multiple jurisdictions or domains, a small focused model won’t have the breadth to serve them well. Microsoft Azure notes that SLMs have limited capacity for complex language and lower accuracy on tasks that require broad knowledge. Red Hat points out that SLMs may need to be combined with other tools for sophisticated reasoning.

Two other situations push you back towards a managed service. If your team has no technical capacity to manage even a modest server or cloud instance, a fully managed LLM service like Azure OpenAI or Anthropic’s API is a safer starting point and almost certainly cheaper in setup time. And if you don’t have a body of structured internal documents to ground the model on, the specialisation advantage disappears entirely. A small model tuned on thin data often performs worse than a general model used carefully.

What other ideas connect to SLMs?

A few related concepts come up often alongside SLMs, and understanding them makes conversations with a supplier or technology partner easier. Retrieval-augmented generation, or RAG, pairs any language model with a document store, letting the model pull in relevant text before answering a question. RAG can be used with both large and small models, and for many SMEs it’s a simpler first step than full fine-tuning.

Fine-tuning means further training a pre-existing model on your specific data to improve its accuracy on particular tasks. It’s a more involved step than RAG, and for a small service firm it usually makes sense to try RAG first before investing in fine-tuning.

Edge AI describes running models locally on devices like tablets or phones, without a cloud connection, which is why SLMs are particularly relevant for field-based teams or regulated environments where data cannot go online.

The UK ICO’s guidance on AI and data protection applies to any of these setups. Running a model locally doesn’t remove your compliance obligations. You still need a legal basis for processing personal data, a data protection impact assessment for high-risk uses, and clear notices to staff if automated processing affects decisions about them.


If you’re asking whether an SLM could replace your current AI subscription, it probably can’t, at least not entirely. The two serve different purposes: an SLM offers narrow precision on your own content, a general LLM offers broad capability across everything. The more useful question is whether you have one well-defined workflow, with a clean document set behind it, where a focused model would serve you better. That’s where the SLM case starts to become real.

Sources

- ICO (2024). Guidance on AI and data protection. Covers lawful basis, data minimisation, and DPIA requirements for AI use in UK organisations. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - ICO (2024). Data protection impact assessments. Sets out when a DPIA is required for high-risk processing, including HR decision-support tools. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/accountability-and-governance/data-protection-impact-assessments/ - NCSC (2024). Using AI safely and securely. Guidance on securing training data, model APIs, and access controls for AI deployed in UK organisations. https://www.ncsc.gov.uk/collection/using-ai-safely-and-securely - NCSC and CISA (2023). Guidelines for secure AI system development. Covers vulnerabilities including prompt injection and data poisoning, applicable to SLMs and LLMs alike. https://www.ncsc.gov.uk/collection/guidelines-secure-ai-system-development - CMA (2023). AI foundation models: initial report. Sets out UK competition and consumer protection concerns around major AI providers, including lock-in risks. https://www.gov.uk/government/publications/ai-foundation-models-initial-report - EU AI Act (2024). Regulation (EU) 2024/1689 on Artificial Intelligence. Categorises AI systems by risk level; UK firms serving EU customers may face obligations regardless of model size. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689 - World Economic Forum (2025). AI: what are SLMs and are they good for businesses? Reports that quantisation can reduce model size and memory requirements by up to 75% with limited performance impact. https://www.weforum.org/stories/2025/01/ai-small-language-models/ - IBM Think (2024). What are small language models? Explains parameter ranges, use cases, and suitability for edge deployment and offline inference. https://www.ibm.com/think/topics/small-language-models - Microsoft Azure (2024). What are small language models? Covers faster training, reduced energy consumption, and deployment on resource-constrained devices including the Phi-2 model. https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-are-small-language-models - High Digital (2024). The rise of small language models in business AI. UK agency case study documenting SLM deployments including helpdesk bots, compliance checkers, and client-facing Q&A tools for SMEs. https://www.highdigital.co.uk/blog/why-small-language-models-are-the-future-of-business-ai/

Frequently asked questions

What is the difference between an SLM and an LLM?

A large language model runs on hundreds of billions of parameters and is trained on a broad slice of the internet, making it versatile but expensive to run and often imprecise on narrow tasks. A small language model typically has between a few million and 10 billion parameters, can be tuned to your own documents, and runs on modest hardware. The trade-off is breadth: an SLM does a few things well, not everything adequately.

Do I need a data scientist to run an SLM?

For the simplest SLM deployments, you do not. Platforms like Microsoft Azure and IBM watsonx offer managed services where the model infrastructure is handled for you. What you do need is someone who can curate and organise your internal documents, set up the integration, and monitor output for errors. For more involved projects such as fine-tuning an open-source model on your own server, you would need at least a part-time technical person with some machine-learning background.

What are my data protection obligations if I deploy an SLM in my UK firm?

The UK ICO's guidance on AI and data protection applies regardless of model size. You need a lawful basis for processing personal data, a data protection impact assessment if your use is high-risk (such as HR decision-support or client profiling), and clear notices to staff or clients about any automated processing. Running an SLM on your own infrastructure can help with data minimisation because data stays on your network, but it does not remove your compliance obligations.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation