What is transfer learning? A plain-English guide for owners

Insurance broker managing director at her desk reviewing a quote document with claims paperwork stacked beside her
TL;DR

Transfer learning starts with a model already trained on millions of examples and adapts it to your specific task with hundreds of examples instead of millions. In 2026, fine-tuning a 7B open-source model runs roughly £50 to £500 in compute, with parameter-efficient methods like LoRA cutting that ten- to hundred-fold. The owner question is which adaptation fits the problem: prompt engineering, RAG, or fine-tuning.

Key takeaways

- Transfer learning is the umbrella technique. Fine-tuning is one implementation; feature extraction is another. RAG and prompt engineering are adjacent methods, not transfer learning itself. - In 2026 the SME economics are workable. A 7B open-source fine-tune runs £50 to £500 in GPU time, and inference on cost-optimised models lands near £37 per 10,000 queries. - Parameter-efficient fine-tuning (LoRA, QLoRA) cuts cost by ten to a hundred times and trains five to ten times faster, with comparable accuracy to full fine-tuning. - Pick by data shape. Stable data plus tone or terminology needs fine-tuning. Frequently changing facts want RAG. Fewer than 100 examples want prompt engineering. Most production systems combine all three. - Fine-tuning on customer or employee data triggers UK GDPR obligations and EU AI Act duties from 2 August 2026. Cost the compliance work in before commissioning the build.

A 40-staff specialist insurance broker has three options on her desk. A London consultancy has quoted £40,000 for a “bespoke AI” build. A smaller technical partner has quoted £6,000 to fine-tune an open-source 7B model on 2,000 historical claims documents, plus £450 a month in inference cost. Her third option is to do nothing and add another senior claims handler. The broker processes around 50,000 claims documents a year, each currently taking a senior handler 4.8 hours to read, classify and route.

The middle option, she has been told, is “transfer learning”. She wants to know whether the £6,000 figure is realistic, what 2,000 labelled documents actually costs to produce, whether the resulting model will hold up against the variability of real claims, and whether she should be choosing between fine-tuning, RAG and prompt engineering or quietly combining all three.

This is the question worth answering, because transfer learning is the technique that explains why a 30-staff UK firm can run a custom AI capability in 2026 without a data science team or a six-figure compute budget. The cost economics shifted in the last eighteen months. The vocabulary has not caught up.

What is transfer learning?

Transfer learning is the technique of taking a model already trained on millions of examples and adapting it to your specific task using a fraction of the data and compute. The pre-trained model has learned the hard parts: how to recognise edges in images, how grammar works in language, how concepts relate. Your job is to teach it the narrow patterns of your business, often with a few hundred examples.

The lower layers of a model hold general knowledge that transfers across tasks. The higher layers capture task-specific patterns, and those are the layers you retrain or replace. Two main implementations exist. Feature extraction freezes the base model and trains a new output head on top. Fine-tuning unfreezes some upper layers and continues training at a slower learning rate. The GeeksforGeeks reference and the Weights and Biases write-up both frame it the same way: transfer learning is the principle, fine-tuning is one implementation of it.

Why does it matter for your business?

It matters because the cost has collapsed. Training a foundation model from scratch can exceed £400,000 in compute alone and runs eighteen months plus, per the SmartDev reference. Fine-tuning a 7B open-source model like Mistral 7B runs £0.48 to £1 per million training tokens. A typical SME 2,000-record fine-tune lands at £50 to £500 in GPU time, with inference at roughly £37 per 10,000 queries on cost-optimised models.

Precedence Research forecasts the global transfer learning market to grow from £2.93bn in 2025 to £3.61bn in 2026, with a 23 per cent compound annual growth rate through 2035. The figures matter less than the direction. Smaller organisations now find the economics workable, which is why the technique is moving from research labs into the day-to-day procurement decisions of UK SMEs.

The use cases are concrete. A specialist insurance firm fine-tuned a transformer on 2,000 labelled claims and cut classification time from 4.8 hours to 3.2 minutes per document at 94.7 per cent accuracy, per the Artificio case study. Harvey fine-tuned models on 10 billion tokens of case law and now serves 42 per cent of the top 100 US law firms. A small manufacturer can take 200 quality control photos, augment them to 2,000-plus variations, and catch defects that human inspectors miss 5 per cent of the time. A recruitment consultancy can fine-tune on 300 historical applications and cut CV screening time by 60 per cent.

Where will you actually meet it?

You will meet it embedded inside vendor pitches that say “fine-tuned for your business” without naming the technique. Vendors are typically selling one of three things: a full fine-tune of a small open-source model, a parameter-efficient fine-tune (LoRA or QLoRA) of a larger one, or a wrapper around someone else’s API with a custom prompt. The economics differ by an order of magnitude, so the question to ask is which one you are actually buying.

You will also meet it on Hugging Face, which now hosts 2.8 million pre-trained models and 500,000 datasets per its Spring 2026 state-of-the-platform post. AWS SageMaker JumpStart, Google Vertex AI and Azure Machine Learning all offer pre-configured fine-tuning templates. Open-source platforms like SiliconFlow, LLaMA-Factory and Unsloth lower the bar further. Unsloth in particular fine-tunes models on just 3GB of RAM in a free Colab notebook. None of this means you should run the project yourself. It means the supply side is no longer the bottleneck.

The interesting cost lever is parameter-efficient fine-tuning. Full fine-tuning retrains every parameter in a model. For a 7B model that is seven billion numbers to update, expensive and slow. LoRA adds small adapter matrices alongside the frozen base and trains only the adapters, perhaps 100,000 parameters instead of seven billion. QLoRA quantises the base model to 4-bit precision and shrinks the memory footprint by 75 per cent. The combination is fine-tuning that costs ten to a hundred times less and trains five to ten times faster, with comparable accuracy.

When to ask for it versus when to ignore it

Ask for transfer learning when your data is stable, when you need consistent tone, terminology or reasoning style, and when you have at least 200 to 500 labelled examples in the domain. Customer support classification, document routing, contract drafting in a house style, claims triage: all good fits.

Ignore it and reach for retrieval-augmented generation when your data changes frequently, when you need to cite the source of an answer, or when the knowledge base is large. Product catalogues, policy documents, pricing sheets and case law libraries all want RAG, because the model never has to learn the content. The IBM RAG-vs-fine-tuning explainer is the cleanest reference if you want a longer treatment.

Ignore it and reach for prompt engineering when you are prototyping, when you have fewer than 100 examples, or when you need to deploy in hours not weeks. Many production systems eventually combine all three. A fine-tune for tone and reasoning, RAG for current facts, prompt engineering on top to handle edge cases.

Two boundaries worth flagging. Source and target tasks must be related. A vision model transfers to other vision tasks; a language model transfers to other language tasks. Cross-modal transfer is possible but expensive. And fine-tuning on customer or employee data triggers UK GDPR obligations and EU AI Act duties, fully applicable from 2 August 2026 per the European Commission’s regulatory framework. The ICO guidance on AI and data protection is the right starting point. SMEs under 250 staff or £50m turnover get simplified pathways, but compliance still has to be costed in.

Fine-tuning is the most common implementation of transfer learning. Foundation models are the things you transfer-learn from. Retrieval-augmented generation is the adjacent adaptation method that suits frequently changing data. Prompt engineering is the lightest-touch option, and the prompt engineering versus fine-tuning decision guide walks through the trade-off in more depth. Machine learning is the parent discipline, since transfer learning is fundamentally a machine learning method.

The broker’s question, in the end, was the right one. The £6,000 figure is realistic for a 7B fine-tune on 2,000 labelled records if the labelling work is in hand. The 2,000 documents are the constraint, not the compute. And she does not have to choose between fine-tuning, RAG and prompt engineering. She picks the lightest option that solves the problem in front of her, then layers the others as the use case earns them.

Sources

- GeeksforGeeks, Introduction to Transfer Learning. https://www.geeksforgeeks.org/machine-learning/ml-introduction-to-transfer-learning/ - Weights and Biases, Transfer Learning Versus Fine-Tuning. https://wandb.ai/wandb_fc/genai-research/reports/Transfer-learning-versus-fine-tuning--VmlldzoxNDQxOTM3OQ - SmartDev, Transfer Learning vs Training From Scratch. https://smartdev.com/transfer-learning-vs-training-from-scratch/ - Price Per Token, Fine-Tuning Cost Reference. https://pricepertoken.com/fine-tuning - IBM Think, RAG vs Fine-Tuning. https://www.ibm.com/think/topics/rag-vs-fine-tuning - IBM Watsonx Documentation, LoRA Fine-Tuning. https://www.ibm.com/docs/en/watsonx/w-and-w/2.1.0?topic=tuning-lora-fine - Hugging Face, State of Open-Source Spring 2026. https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026 - Artificio AI, AI Document Classification Case Study. https://artificio.ai/blog/boosting-roi-with-ai-document-classification-a-case-study - Precedence Research, Transfer Learning Market 2025-2035. https://www.precedenceresearch.com/transfer-learning-market - ICO, Guidance on AI and Data Protection. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/ - EU AI Act, Small Businesses Guide. https://artificialintelligenceact.eu/small-businesses-guide-to-the-ai-act/

Frequently asked questions

How is transfer learning different from fine-tuning?

Transfer learning is the principle of reusing knowledge a model has already learned from millions of examples and adapting it to your task. Fine-tuning is one way to do that, where you continue training the upper layers on your data. Feature extraction is another, where you freeze the base model and train only a new output layer. The Weights and Biases reference frames it as principle versus implementation. Most owners hear the two terms used interchangeably, but the distinction matters when you are pricing a project.

How much labelled data do I actually need to fine-tune a model?

For most SME tasks, between 200 and 2,000 examples is enough. The Artificio document classification case study reached 94.7 per cent accuracy on insurance claims with 2,000 labelled records. A small manufacturer can take 200 photos and use data augmentation to generate 2,000-plus variations. Below 100 examples, prompt engineering or retrieval-augmented generation tends to outperform a fine-tune. The harder question is data quality. Clean, consistent labels matter more than volume.

When should I use RAG instead of transfer learning?

Use retrieval-augmented generation when your data changes weekly, when you need to cite the source of an answer, or when the knowledge base is large. Product catalogues, policy documents, and pricing sheets are classic RAG cases because the model never has to learn the content; it pulls fresh information at query time. Fine-tuning is the right call when you need consistent tone, domain terminology, or reasoning style on stable data. Many production systems combine the two.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation