What is a foundation model? Why it matters for your business

A person at a desk reading a printed proposal with a laptop and notebook in front of them
TL;DR

A foundation model is a large pre-trained AI system that vendors adapt to build specific products on top of. GPT-5, Claude Opus, Gemini, Llama and Mistral are all foundation models. Every LLM is a foundation model, but foundation models also include vision and multimodal systems. The business question is not whether your tool is built on one but which one, who owns it, and what happens when that model is retired.

Key takeaways

- A foundation model is a large pre-trained AI system that other tools are built on top of. - Every LLM is a foundation model. Not every foundation model is an LLM. Vision and multimodal systems also qualify. - Vendors choose a foundation model and wrap it. Their product inherits the model's strengths, costs and lifecycle. - "Model-agnostic" is mostly marketing. True model-agnostic platforms use a gateway and a unified API. - Foundation models are deprecated on the provider's schedule, not yours. Plan for migration before you need it.

A founder I work with watched a salesperson move from “powered by AI” on slide one to “built on a leading foundation model” on slide two without naming the model. He asked me afterwards whether that was a meaningful claim or a way of avoiding the question. It was the second.

By 2026 nearly every business AI tool is built on top of a foundation model, and the vendor who will not name it is the vendor whose pricing, continuity and capability are at someone else’s mercy. The plain-English version of the term tells you when the answer matters and when it does not.

What is a foundation model?

A foundation model is a large AI system, pre-trained at enormous cost on huge volumes of data, that other tools are built on top of. The term was coined by Stanford researchers in 2021 to describe a paradigm shift in how AI is created. Before foundation models, machine learning teams built specialist systems for each narrow task. Foundation models invert that. One model is trained once on vast, diverse data, and then adapted by downstream users to do many different things.

GPT-5 from OpenAI, Claude Opus from Anthropic, Gemini from Google, Llama from Meta, Mistral from Mistral AI and Grok from xAI are all foundation models. They arrive pre-trained, and your business or your vendor adapts them by adding instructions, by feeding in your documents at query time, or by fine-tuning them on your examples.

The category is broader than LLMs. Every LLM is a foundation model, but foundation models also include vision systems like CLIP and SAM, and multimodal systems like GPT-4o and Gemini. When a vendor says “built on an LLM” they mean text. When they say “built on a foundation model” they may mean text, vision, audio or all three. Worth asking which.

The economics matter too. Pre-training a foundation model from scratch costs tens of millions of pounds and is the preserve of a handful of labs. Adapting one is cheap and fast. That is why nearly every SME-facing AI tool in 2026 wraps an existing foundation model rather than training a proprietary one. The vendor’s value is in the wrapping, the data, the workflow integration and the user interface. The capability ceiling is set by the foundation model underneath.

Why it matters for your business

The first thing it changes is cost transparency. When a vendor charges a flat per-seat fee, they absorb the per-token API cost their foundation model provider charges them. OpenAI raised input token pricing on GPT-5.5 by 100% in May 2026. Long-context workloads jumped 49% to 92%. Your vendor either ate that increase, throttled you to a cheaper model, or passed it on. Knowing the foundation model lets you anticipate the next move.

The second is continuity. OpenAI announced GPT-4.5 in February 2025 and deprecated it within two months because the inference cost was uneconomic. Tools anchored to that model had to migrate. Prompts that worked on the old model behaved differently on the replacement. By 2026 this cycle is normal. Foundation models are retired roughly every twelve to eighteen months, and the lifecycle is set by the provider, not by you or your vendor.

The third is capability inheritance. Your product cannot do anything the underlying model cannot do, regardless of marketing. If the model hallucinates, the product hallucinates. If the model has a knowledge cut-off, the product does too unless retrieval is bolted on. The UK National Cyber Security Centre is clear that hallucination, bias and prompt injection are intrinsic to how the technology works, not vendor flaws to be fixed.

Where you will meet it

You will meet “foundation model” in vendor pitches where the salesperson does not want to commit to a specific name. “Built on a leading foundation model” or “built on a state-of-the-art foundation model” both translate to “I would rather not tell you exactly which one.” Sometimes that reflects a genuine model-routing setup behind the scenes. More often it is hedging.

You will meet “model-agnostic” in pitches where the vendor wants to neutralise the lock-in concern. The claim is that their platform can swap one foundation model for another without breaking your application. In a few cases this is true and the vendor has built a real abstraction layer, often called a gateway, that translates between your business logic and whichever model is underneath. In most cases the claim means they have built separate integrations to several models, which is fragmentation rather than agnosticism. The test question is “can I switch from GPT to Claude without changing my prompts or my code?” A truthful answer of “configuration change only” indicates the real thing.

You will also meet foundation model language in regulated contexts. Under the EU AI Act, providers of general-purpose AI models placed on the market from 2 August 2025 must publish a summary of training data and disclose how they have evaluated systemic risks. If you operate in or serve the EU, the foundation model your tool uses is part of your compliance picture, and the provider’s documentation is what you point at when an auditor asks.

When to ask about it, when to ignore it

Ask hard questions when your business outcomes depend directly on the model’s capability, cost or availability. A customer service product where the foundation model is the engine of every response deserves a specific name and a specific version. A bid-writing tool used by your team daily deserves the same. Ask the vendor four questions in sequence: which foundation model and version, what is your migration plan when it is deprecated, what is your data handling and residency policy, and how much work is required for me to switch you out.

Ask hard questions when you are in a regulated industry. Healthcare, finance and legal services all carry obligations around the provenance and explainability of automated decisions. The FCA’s published approach to AI is explicit about validation and explainability. The ICO’s guidance is explicit about lawful basis and Data Processing Addendums. Both presuppose you know which foundation model is doing the work.

Ignore the term when you are using a low-stakes tool in a one-off way. The model behind a meeting summariser that drafts your team’s notes is, at this point, an implementation detail. Whether it is GPT, Claude or Llama matters less than whether the summaries are good enough that the team uses them. The product is the user experience, not the architecture diagram.

Ignore “model-agnostic” claims that come without an architecture answer. If the vendor cannot describe the abstraction layer or the gateway, the claim is marketing. Do not buy on it.

LLM, large language model, is a foundation model specialised in text. Every LLM is a foundation model. Not every foundation model is an LLM. The distinction matters when the product handles images, audio or video as well as text.

Base model is a near-synonym for foundation model in most vendor language. When a vendor says “built on the Llama base model” they mean the same as “built on the Llama foundation model”.

Fine-tuning is a way of adapting a foundation model to a specific task by adjusting the model’s weights using your own data. More expensive and slower than prompting or retrieval, but the right call when you need consistent behaviour the base model cannot deliver from instruction alone. Fine-tuning has its own explainer in this series.

Frontier model is a label, not a category, for the most capable models available at any given time. In May 2026 the frontier set includes GPT-5, Claude Opus 4.6 and Gemini 3 Pro. Frontier models are powerful but expensive and tend to be revised on shorter cycles. The UK AI Security Institute uses the term in its evaluations of model risk.

Open-weight model is a foundation model whose parameters have been published openly so that anyone can download and run them. Llama, Mistral and DeepSeek’s variants are open-weight. They cost less per query at high volume and give you full control, in exchange for running the infrastructure yourself.

The point of the vocabulary is not to make you a model expert. It is to give you enough purchase that the next time a vendor says “built on a foundation model” without naming one, you can ask the question that turns the marketing into a contract conversation.

Sources

Frequently asked questions

Is a foundation model the same thing as an LLM?

An LLM is a type of foundation model. Every LLM is a foundation model, but not every foundation model is an LLM. Vision models like CLIP and SAM, and multimodal models like GPT-4o and Gemini, are also foundation models. The distinction matters when a vendor says "built on a foundation model" without naming the specific model or the modalities it covers.

Should I worry about which foundation model my vendor uses?

Yes, in two scenarios. First, when the vendor's pricing depends on the underlying model and a price change at the model provider will pass through to you. Second, when the vendor's continuity depends on the model staying available. OpenAI deprecated GPT-4.5 within two months of release in 2025 and raised input pricing on GPT-5.5 by 100% in May 2026. Both events ripple through tools built on those models.

Is "model-agnostic" a real thing or marketing?

Both. Genuinely model-agnostic platforms use an abstraction layer (often called a gateway) so the same application code can call any model with a configuration change. Most vendors who claim agnosticism have built separate integrations to two or three models, which is fragmentation rather than agnosticism. The test question is whether you can switch from GPT to Claude without re-engineering your prompts and workflows.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation