A founder I spoke with last week was choosing the model behind a new automation product for his 35-person customer-support firm. Two paths sat on his desk. Anthropic’s Claude API, integrated in a fortnight, billed per token at roughly £2,500 a month at projected volume. Or a self-hosted Llama 4 deployment on a managed European GPU host, infrastructure at around £1,500 monthly plus another £1,000 of internal engineering time, and eight to ten weeks before it was production-ready.
He wasn’t picking a religion. He was picking a deployment mode that fit his customers, his regulatory exposure and his team’s actual capability. The right answer turned out to be both. Closed-source for the prototype. Open-weight by month twelve. The architecture had to be designed for that from day one.
What is open-source vs closed-source AI?
Closed-source AI keeps the model’s parameters inside the vendor’s infrastructure and you reach the model through a paid API. OpenAI’s GPT-5, Anthropic’s Claude Opus 4.6 and Google’s Gemini 3.1 Pro are the dominant closed-source frontier models in 2026. Open-weight AI publishes the trained parameters as downloadable files. Meta’s Llama 4, Mistral Large 3, DeepSeek V3 and Alibaba’s Qwen are the leading open-weight families. You run them on your own GPU hardware or a third-party host.
The terminology is imprecise. The Open Source Initiative draws a line between true open-source AI, which would also publish the training code and the data, and open-weight, which only publishes the trained parameters. Almost every model the press calls “open-source” today is technically open-weight. The distinction matters when you read the licence, because permissions on the weights do not always extend to the training pipeline or to derivative-model training.
The 2026 capability and cost reality
The capability gap has narrowed sharply. MIT Sloan’s 2026 analysis puts open-weight performance at roughly 89.6% of closed-source flagship benchmarks at launch, closing to parity within around thirteen weeks. For coding, customer service, content generation and the bulk of business automation, the open-weight options are competitive on quality and far cheaper. For frontier reasoning benchmarks the closed-source labs still lead by three to eight points, and that gap matters for a small share of workloads.
The cost picture is where the procurement conversation usually lands. OpenAI’s GPT-5.2 is around $1.75 per million input tokens and $14.00 per million output tokens. Claude Opus 4.6 is $5.00 in and $25.00 out. Llama 4 Maverick on a serverless host such as DeepInfra runs around $0.17 in and $0.60 out. That is roughly 29 to 41 times cheaper at the API level. Self-hosted infrastructure is largely fixed, while API costs scale linearly with usage. The crossover point for a typical UK services firm sits somewhere between £1,500 and £2,500 of monthly API spend, once you cost in the engineering time to run a GPU.
Where does data sovereignty change the procurement calculus?
Data residency is the underrated procurement driver. Under GDPR Article 48, transfers of personal data to a non-adequate country without specific safeguards are a breach with fines up to 4% of global turnover. The US CLOUD Act lets US law enforcement compel American companies to hand over data stored abroad regardless of location. Together these create a sovereignty gap that closed-source US-headquartered providers cannot fully close with contract clauses.
In practice that means AWS Bedrock, Azure OpenAI and Google Vertex AI in EU regions are operating in a legal grey zone for regulated personal data. Self-hosting Llama 4 or Mistral on UK or EU infrastructure resolves the question at the infrastructure layer, not in a contract footnote. The UK is accelerating sovereign-AI plans precisely because of that dependence, and government and regulated-sector RFPs increasingly carry “strategic autonomy” language. For a firm tendering into healthcare, financial services or government work, an EU-resident open-weight deployment can shift from “nice to have” to a procurement requirement.
When to default to closed, when to self-host, when to go hybrid
Default to closed-source when you are prototyping, your team has no GPU capability, your monthly AI spend is under £1,500 and your data is not regulated. The vendor maintains guardrails, abuse monitoring and incident response, and you trade a higher per-token price for speed to market. Lloyds Banking Group’s 2026 survey found the typical UK SME spends under £25,000 a year on AI, consistent with closed-source APIs being the right default for many firms.
Default to open-weight self-hosting when data residency is mandatory under GDPR or sector rules, your monthly API spend is above £2,500 and predictable, your competitive edge depends on fine-tuning a model on your own data, or your customer base includes government and “strategic autonomy” procurement criteria. The UK AI Security Institute’s analysis is honest about the trade-off. Open-weight safeguards can be removed by adversarial fine-tuning, so when you self-host you own runtime monitoring, content filtering and incident response in a way the API providers do not require of you.
Many growing services firms end up hybrid. Closed-source for frontier reasoning and prototyping, open-weight for cost-sensitive volume work and compliance-sensitive deployments. The architecture pattern that holds this together is a model abstraction layer such as LiteLLM or Ollama. With one in place, switching a workload between vendors is a configuration change rather than a rewrite. Re-evaluate the architecture quarterly. The worst decision is locking the business into one vendor’s roadmap and discovering eighteen months in that switching has become ruinous.
Related concepts
Foundation model is the broader category that both closed-source and open-weight systems sit inside. Every LLM is a foundation model and Llama, Claude, GPT-5 and Gemini are all foundation models. Whether the weights ship as a downloadable file or stay inside the vendor’s infrastructure is the open-vs-closed question.
Fine-tuning is the customisation lever that open-weight models unlock most fully. Closed-source providers offer API-level fine-tuning that adjusts behaviour without touching the underlying weights. Open-weight self-hosting allows weight-level fine-tuning on your own data, which is the stronger option when domain-specific performance is part of your competitive edge.
SaaS AI vs self-hosted AI is the deployment-mode decision guide that sits next to this post. The open-vs-closed question is upstream. The deployment-mode question is what to do about it once you have decided which way to go.
The EU AI Act sits in the background of any sovereign-AI conversation, particularly for general-purpose AI providers and high-risk deployments. The 12 questions to ask an AI vendor is the procurement checklist that closes the loop on whichever model you end up running, and the one-page AI risk register is the governance step that follows once the deployment-mode decision is made. If the next step is mapping which workloads belong on which model, book a conversation.



