What is a small language model (SLM)? Why it matters for your business

Two people at an office desk, one pointing at a laptop screen while the other takes notes
TL;DR

Small language models (SLMs) are generative AI models with typically fewer than 10 billion parameters. They handle specific text tasks at lower cost and with less computing power than large frontier models, and can run on your own server or a modern laptop. That keeps data on your infrastructure and simplifies UK GDPR obligations. They fall short on broad, complex tasks. For an owner-managed business, the key question is whether your AI use case needs a powerful generalist or a well-chosen specialist.

Key takeaways

- A small language model (SLM) is a generative AI model with typically fewer than 10 billion parameters, capable of specific text tasks like drafting, summarising, and question-answering at lower cost than large frontier models. - SLMs can run on your own server or a modern laptop, keeping client and staff data on your infrastructure rather than routing it through a third-party cloud. - They work best on narrow, repetitive tasks built around your own documents, such as internal knowledge assistants, focused client-facing bots, or CRM co-pilots. - When you need broad reasoning, complex analysis, or high-quality brand copy, a frontier cloud model will generally outperform a well-tuned SLM. - UK GDPR obligations apply regardless of model size. Any SLM processing personal data requires a lawful basis, a DPIA, and clear processor agreements.

Your IT consultant mentioned it at the end of a call. Something about running a ‘small language model’ on your own server rather than sending everything to OpenAI. You made a note. Six months on, it’s still there, partly because you weren’t sure what it was, and partly because you weren’t sure it mattered.

It likely matters more than you’d expect. The idea is less complicated than the label suggests.

What is a small language model?

A small language model (SLM) is a generative AI model with far fewer parameters than the large frontier models hosted by OpenAI, Google, and Anthropic. Typically under 10 billion parameters, it can still handle the same kinds of text tasks: drafting emails, summarising documents, answering questions. The difference is that it runs on ordinary hardware rather than needing specialised computing infrastructure.

Think of it as the difference between a specialist and a full-service advisory firm. The specialist knows one domain deeply and costs less per engagement. The firm knows a great deal about everything, which is useful when questions are genuinely complex and broad, but you’re paying for capacity you won’t always use.

The UK Parliament’s POST research note on large language models provides useful context: the ‘large’ in LLM refers to parameter count, and the distinction between large and small is informal and shifts as training techniques improve. Well-known SLMs include Meta’s Llama 3 8B model and Mistral’s 7B model. Both run on a single modern GPU or a powerful laptop, and both are open-weight, meaning the underlying code can be downloaded and deployed on your own infrastructure.

Why does it matter for your business?

For an owner-managed business running AI daily, cost and data control are the two pressures that accumulate. SLMs run on lighter hardware and typically cost considerably less per task than frontier cloud models. They can also run on your own server, keeping client data on your infrastructure rather than routing it to a third-party cloud. Both your monthly bill and your GDPR exposure change.

Thoughtworks UK’s analysis of small language models notes faster response times, lower costs, and reduced energy consumption compared with larger models. Thoughtworks also describes what they call the ‘specialised worker’ approach: rather than routing every task through one large cloud model, a set of smaller SLMs each handle a specific job, which is often considerably cheaper for high-volume, repetitive workflows.

The ICO’s guidance on generative AI makes clear that organisations must know where their data is stored and processed, and whether international transfers are occurring. An SLM running on a UK server gives you a cleaner answer to that question than a US-hosted cloud model. The CMA’s initial report on AI foundation models adds a separate angle: open-weight SLMs like Llama 3 and Mistral reduce dependency on a small number of large cloud providers, which aligns with the regulator’s concerns about market concentration in AI infrastructure.

Where will you actually meet it?

You’ll encounter SLMs most commonly in three configurations: on-device tools that process data without a network connection (a mobile app that generates site-visit reports before the engineer returns to the office), inside line-of-business software (a CRM that drafts follow-up emails from call notes), and self-hosted internal assistants that answer questions from your own documents, procedures, and case files.

The internal knowledge assistant is where many owner-managed businesses find the most immediate practical return. Rather than staff searching scattered documents or asking colleagues for answers, a well-configured SLM looks up your own policies and procedures before responding. The model works from your documents rather than needing encyclopaedic coverage.

Client-facing applications work well when the service scope is narrow. A firm with a defined offering, such as a fixed-process clinic, a specialist tax adviser, or a managed IT provider, can deploy a focused bot that handles common questions accurately and keeps client data within its own infrastructure. The NCSC’s guidance on security considerations for AI as a service is directly relevant here: your attack surface expands when data flows out to external systems, so on-device or on-premise SLMs reduce that exposure compared with cloud-routed alternatives.

When should you ask about it, and when should you ignore it?

Consider an SLM when you have a specific, repetitive text task running at high frequency, involving client or staff data, with no need for creative breadth or open-ended reasoning. Stick with a frontier cloud model when the task is complex, brand-critical, or infrequent enough that the cost difference doesn’t justify the additional setup. The deciding factor is the task, not the technology.

Thoughtworks UK is candid about where SLMs fall short: limited general knowledge, lower accuracy on complex tasks, and less nuanced language generation than frontier models. If you’re producing long-form marketing content, handling sensitive advisory communications, or asking the model to reason across unfamiliar territory, a larger model will usually serve you better.

There’s a timing dimension worth keeping in mind. Frontier models have been getting cheaper and faster each year, and that trend continues. If the cost gap between large and small models narrows significantly over the next few years, the case for the extra setup behind a self-hosted SLM becomes harder to justify for a small operation.

The FCA’s 2023 discussion paper on AI in financial services makes clear that regulated firms remain accountable for any AI system under Consumer Duty and operational resilience requirements. A smaller, more controllable model with clear training data may be easier to audit and explain to a regulator, which is a genuine consideration if you’re in a sector with active oversight.

What else connects to small language models?

SLMs sit inside a wider AI vocabulary you’ll encounter as you build capability in your business. Three terms come up regularly alongside them: RAG, or retrieval-augmented generation (where the model searches your documents before answering rather than relying solely on its training), fine-tuning (adapting the model on your own data to improve accuracy on specific tasks), and agent frameworks (where multiple models coordinate to complete multi-step tasks).

RAG is often the better starting point for an owner-managed business than fine-tuning, because you don’t need to retrain the model from scratch. You give it a reference library of your own documents and it searches before responding. This produces practical results for many businesses with less specialist overhead than a full retraining programme.

Agent frameworks are increasingly where SLMs are being deployed as specialised components: a larger orchestrating model coordinates several SLMs, each handling a specific step in a workflow, which is often considerably cheaper than routing the entire chain through a single frontier model.

If your work involves personal data, the ICO’s DPIA guidance and the NCSC and CISA joint guidance on secure AI system development are both practical starting points, regardless of which model size you choose. The principles in that guidance hold across the board: minimise the data sent, encrypt in transit, apply strong access controls, and log what the system does. Model size changes the cost and data-residency picture. It doesn’t change the governance obligations.

A small language model is a practical tool for well-defined, high-frequency jobs. If your team is handling volume text tasks and you’re uncomfortable with where that data is going, it’s worth a proper assessment of whether an SLM-based approach fits. Book a conversation to think through which AI tools make sense for your business, and which would be wasted on problems they’re not built for.

Sources

- Information Commissioner's Office (2023). Generative AI and Data Protection. Guidance on lawful basis, DPIAs, data minimisation, and controller responsibilities when deploying generative AI. https://ico.org.uk/for-organisations/guidance-on-ai-and-data-protection/generative-ai/ - Information Commissioner's Office. Data Protection Impact Assessments. Guidance on when and how to conduct a DPIA, including for AI systems that process personal data. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/accountability-and-governance/data-protection-impact-assessments/ - National Cyber Security Centre (2023). Security Considerations for AI as a Service. Guidance on attack surface, data minimisation, and identity controls when integrating external AI services. https://www.ncsc.gov.uk/guidance/security-considerations-for-ai-as-a-service - NCSC and CISA (2023). Guidelines for Secure AI System Development. Joint guidance on secure-by-design principles for AI deployments, covering prompt injection, data poisoning, and model theft. https://www.ncsc.gov.uk/collection/guidelines-secure-ai-system-development - Financial Conduct Authority (2023). Regulatory Approach to Artificial Intelligence in Financial Services (DP23/4). Sets out FCA expectations on accountability, Consumer Duty, and operational resilience for AI in regulated firms. https://www.fca.org.uk/publications/discussion-papers/dp23-4-artificial-intelligence - Competition and Markets Authority (2023). AI Foundation Models: Initial Report. Covers market concentration risks, the importance of open and interoperable models, and implications for businesses dependent on a small number of large providers. https://www.gov.uk/government/publications/ai-foundation-models-initial-report - Competition and Markets Authority (April 2024). AI Foundation Models: Update Paper. CMA update on risks of embedded market power and ongoing monitoring of the foundation model market. https://www.gov.uk/government/publications/ai-foundation-models-update-paper-april-2024 - Thoughtworks UK (2024). Small Language Models. Analysis of SLM characteristics, use cases, limitations, and deployment considerations. https://www.thoughtworks.com/en-gb/insights/decoder/s/small-language-models - UK Parliament, Parliamentary Office of Science and Technology (2024). Large Language Models (POSTnote 692). Briefing on parameter scale, LLM capabilities, and the distinction between large and small models. https://post.parliament.uk/research-briefings/post-pn-0692/ - European Parliament (2024). EU Artificial Intelligence Act: Provisional Agreement. Overview of transparency and safety obligations for general-purpose AI models, including which models face stricter requirements. https://www.europarl.europa.eu/news/en/headlines/society/20240313STO19201/artificial-intelligence-act

Frequently asked questions

What is the difference between a small language model and ChatGPT?

ChatGPT runs on large frontier models with hundreds of billions of parameters, hosted in OpenAI's cloud. A small language model typically has under 10 billion parameters and can run on your own server or a powerful laptop. The practical result is lower cost per task, tighter data control, and often better performance on narrow, domain-specific jobs when the model is well-matched to the work.

Do I need a developer to run a small language model in my business?

For many owner-managed businesses, yes, at least for initial setup. Running an SLM on your own hardware requires configuration, security controls, and ongoing maintenance. A growing number of cloud providers now offer managed SLMs where you get cost and data benefits without managing the infrastructure yourself, which is worth exploring before committing to an on-premise build.

Does UK GDPR apply differently if I use a small language model instead of a cloud-hosted one?

The ICO's guidance makes clear that model size does not change your obligations. If your SLM processes personal data, you still need a lawful basis, a Data Protection Impact Assessment, and proper processor agreements. Where an on-premise SLM can help is by making it easier to demonstrate that data stays within the UK and is not shared with third-party training pipelines.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation