What a small language model is in simple terms

a person sitting at a desk reviewing printed documents with a laptop open beside them
TL;DR

A small language model (SLM) works on the same principles as ChatGPT but is built to run cheaply on your own server or a private cloud. For a 5-to-50-person UK service firm, SLMs are worth considering when the task is repetitive and the data is sensitive. For broader, creative, or low-sensitivity work, a standard hosted service is simpler and probably the right choice.

Key takeaways

- A small language model (SLM) sits in the range of one million to ten billion parameters, roughly 100 to 1,000 times smaller than leading large models, and can run on standard servers or modest private cloud infrastructure. - SLMs work best for repetitive, domain-specific tasks involving sensitive data, such as internal document summarisation, staff-facing knowledge search, and FAQ-style client tools. - The ICO requires a Data Protection Impact Assessment for high-risk AI use in the UK, regardless of model size, and organisations must have a lawful basis for processing personal data through any AI system. - The NCSC advises against feeding sensitive or proprietary data into public AI services and recommends self-hosted or private cloud models for any work involving confidential information. - Microsoft's Phi-3 family, Meta's LLaMA 3 8B, and Mistral 7B are the principal open-weight SLMs an SME can realistically self-host, built explicitly for private or small cloud deployments.

Someone at a client lunch mentions they’re piloting an SLM on their own server, handling confidential project notes. You make a mental note, search it later, and land in a sea of benchmarks, parameter counts, and GPU comparisons. The tab closes faster than it opened.

This post is the version of that conversation that skips the technical scaffolding. What a small language model actually is, why a 5-to-50-person service firm might care, where you’re most likely to meet one, and when it’s worth asking questions rather than nodding along.

What is a small language model?

A small language model, or SLM, is an AI text system that works on the same principles as ChatGPT, built to run cheaply on standard hardware. IBM’s technical overview puts SLMs in the range of roughly one million to ten billion parameters, making them roughly 100 to 1,000 times smaller than the flagship models. That size difference is what changes the economics.

Parameters are the internal settings a model learns during training, the numerical weights that determine how it responds to any input. A large model has hundreds of billions of them and typically runs on powerful data centre hardware. An SLM has far fewer, which means it can run on a modest cloud server, a standard laptop, or even a phone.

The analogy that usually lands well: if a large model is a generalist consultant who knows something about everything, an SLM is a specialist assistant trained to be very good at a narrow set of tasks. The questions it handles best are specific and bounded: answering your service FAQs, summarising project notes, or searching internal documents. On those tasks, it can perform just as well as a larger alternative at a fraction of the cost.

Microsoft’s Phi-3 family, Meta’s LLaMA 3 8B, and Mistral 7B are the models you’ll hear named in this space. All three are built explicitly for private, on-device, or small cloud deployments.

Why does it matter for your business?

The practical gap between a large language model and an SLM comes down to cost, data control, and usage predictability. A large hosted model charges per call and processes your prompts on infrastructure you don’t own or control. An SLM running in a private environment changes both of those facts. For UK service firms handling sensitive client data, that distinction matters more than any benchmark score.

In 2023, Samsung employees accidentally fed confidential source code and internal meeting notes into a public ChatGPT session. The leak prompted an internal ban on generative AI tools and a shift to exploring in-house alternatives. It illustrated a structural problem with public AI services: every prompt you send goes somewhere you cannot audit, log, or review.

The NCSC advises UK organisations not to input sensitive or proprietary data into public AI services without adequate controls, and recommends private or self-hosted models for sensitive work. Its 2023 joint guidance with the US CISA reinforces the point: secure AI deployment means protecting the data that reaches the model, not only the outputs it produces.

Technical analysis of the SLM market finds that firms often choose these models specifically for use cases requiring access to sensitive internal data, run entirely within local environments for compliance reasons. For a services firm, that profile includes client matter files, staff records, and project financials.

Where will you actually meet one?

SLMs appear in specific, narrow corners of the AI product market, not as general-purpose assistants but as specialist components. Microsoft’s Phi-3 family is the clearest example, built explicitly as small, cost-efficient models for on-device and private cloud use. Meta’s LLaMA 3 8B and Mistral 7B follow the same pattern. These are what an SME can realistically self-host.

The use cases where SLMs tend to perform well in a services context include helpdesk tools for client queries about your services, pricing, or standard processes; summarisation of meeting transcripts, case notes, or site visit reports; internal knowledge search, helping staff find relevant clauses in contracts, HR policies, or project documentation; and routine drafting of standard reply emails, cover letters, or proposal sections that staff then review and approve.

The key qualifier is that the task is specific and bounded. SLMs are trained on narrower datasets for particular jobs. Where they underperform is on broad, multi-step reasoning across many domains, which is where the large frontier models still hold the advantage. If your use case involves complex analytical work across varied inputs, a large hosted model with strict data handling agreements is the more practical starting point.

When to ask about SLMs vs when to ignore them

A firm with 5 to 50 staff should consider an SLM when the work is repetitive, domain-specific, and involves data you’d rather not send outside your systems. Internal document summarisation, staff-facing knowledge search, and FAQ-style client tools all fit that profile. If the task is broad, creative, or low-sensitivity, a standard hosted service is simpler and probably cheaper.

The UK regulatory picture adds a layer of clarity here. The ICO requires a Data Protection Impact Assessment for high-risk AI use, including any system that processes personal data at scale or profiles individuals. The size of the model is irrelevant to that obligation. A small, privately hosted model still requires the same lawful basis, purpose limitation, and transparency controls as a large external one. The ICO’s AI and data protection risk toolkit is the practical starting point for any UK firm running AI on personal data.

If your firm is FCA-regulated, including financial advice, insurance brokerage, or investment management, the obligation runs further. The FCA’s 2023 feedback statement on AI in financial services confirmed that existing rules on operational resilience, outsourcing, Consumer Duty, and conduct risk apply in full to AI-driven workflows. Using an SLM to pre-draft advice emails or risk summaries counts as an AI-driven workflow, and the governance expectation follows.

Three situations where an SLM is probably not the right call: you have fewer than ten people and no IT support, because the deployment overhead outweighs the benefit; you need frontier-level reasoning or creative output, because smaller models genuinely lag behind on those tasks; and you are working only with public, non-sensitive data, where a standard hosted service is faster and far less effort to maintain.

What sits alongside this in the AI landscape?

Understanding SLMs connects to two broader concepts: how language models are trained, which determines what any model can and can’t do; and the data governance obligations that apply whenever you process personal information in the UK. You don’t need to go deep on either to make a sensible decision, but a basic map of both helps you ask better questions when a vendor pitches one.

On the model side, the terms worth knowing are parameters (the size metric), fine-tuning (adapting a base model on your own documents so it performs better on your specific vocabulary and tasks), and retrieval-augmented generation, or RAG (connecting a model to your document library so it pulls relevant content before responding). SLMs are often fine-tuned or deployed with RAG to compensate for their smaller training base. Both are standard approaches in a competent implementation, not signs of a product that can’t stand on its own.

On the governance side, the ICO’s AI guidance and risk toolkit are the primary UK reference. The CMA’s ongoing review of the foundation model market is worth watching if vendor lock-in concerns you, as it is examining whether concentrated ownership of large AI infrastructure is limiting the availability of alternatives for buyers. For firms with EU clients or operations, the EU AI Act, adopted in 2024, sets additional obligations for providers of AI systems sold into the EU market. If you are running an SLM internally as a tool for your own firm, you are generally a deployer rather than a provider, which carries lighter obligations. That distinction shifts if you package an AI-driven service into a product you sell to others.

If you want to work through what this means for your firm specifically, book a conversation and we can start from your actual use cases rather than the general picture.

Sources

- IBM (2024). What are small language models? Technical overview of SLM definition, parameter ranges, and hardware requirements for deployment. https://www.ibm.com/think/topics/small-language-models - Microsoft (2024). Advancing small language models with Phi-3. Description of the Phi-3 and Phi-3.5 family as cost-efficient on-device and private cloud SLMs. https://azure.microsoft.com/en-gb/blog/advancing-small-language-models-with-phi-3 - ICO (2023). Guidance on AI and data protection. ICO requirements for lawful basis, transparency, purpose limitation, and data minimisation when deploying AI on personal data. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - ICO (2023). AI and data protection risk toolkit. Practical checklist for UK organisations assessing privacy risks from AI training and deployment. https://ico.org.uk/for-organisations/ai-and-data-protection-risk-toolkit/ - NCSC (2023). Using generative AI safely in your organisation. Guidance advising against inputting sensitive data into public AI services and recommending private or self-hosted models for sensitive work. https://www.ncsc.gov.uk/guidance/using-generative-ai-safely - NCSC and CISA (2023). Guidelines for secure AI system development. Joint guidance on protecting training data, securing model supply chains, and monitoring for prompt injection and data exfiltration. https://www.ncsc.gov.uk/guidance/guidelines-secure-ai-system-development - FCA (2023). FS23/4: Artificial intelligence in financial services. Feedback statement confirming that existing rules on operational resilience, outsourcing, Consumer Duty, and conduct risk apply to AI-driven workflows. https://www.fca.org.uk/publications/feedback-statements/fs23-4-artificial-intelligence-financial-services - CMA (2023). AI foundation models: initial review. CMA examination of market concentration in the foundation model sector and implications for competition and buyer choice. https://www.gov.uk/government/publications/ai-foundation-models-initial-review - EU (2024). Regulation (EU) 2024/1689: the Artificial Intelligence Act. Risk-based obligations for providers and deployers of AI systems, including rules for general-purpose AI models. https://eur-lex.europa.eu/eli/reg/2024/1689/oj - BBC News (2023). Samsung workers unwittingly leak secrets via ChatGPT. Report on the incident in which Samsung employees fed confidential code and internal meeting notes into a public AI service. https://www.bbc.co.uk/news/technology-65173048

Frequently asked questions

What is the difference between a large language model and a small language model?

A large language model typically has tens of billions to hundreds of billions of parameters and runs on enterprise-grade data centre hardware. A small language model has roughly one million to ten billion parameters, allowing it to run on standard servers, laptops, or even phones. The tradeoff is that SLMs perform well on specific, narrow tasks but generally lag behind large models on broad reasoning.

Can a small business actually host a small language model itself?

Yes, with the right technical support. Open-weight SLMs such as Mistral 7B or Meta's LLaMA 3 8B can run on a single high-specification server or a modest private cloud instance. A business of five to fifteen people without internal technical capacity would typically need a managed hosting provider or a brief engagement with an AI specialist to set up and maintain the model securely.

Does UK data protection law apply when I run a small language model internally?

Yes. UK GDPR applies to any AI system that processes personal data, regardless of the model's size or where it is hosted. The ICO requires a Data Protection Impact Assessment for high-risk uses, such as profiling clients or monitoring staff. A self-hosted SLM reduces the data transfer risk but does not remove the obligation to have a lawful basis, document your processing, and be transparent with the people whose data is involved.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation