What is model poisoning? Why it matters for your business

A person at a kitchen table reading a printed questionnaire next to an open laptop and a coffee cup
TL;DR

Model poisoning is what happens when an attacker corrupts the training data, the model weights, or the documents an AI system retrieves from, so the system produces unreliable or hidden malicious output. For a UK SME the realistic exposure is consume-side, not training-side: open-weight models pulled from public hubs, RAG corpora that ingest external content, fine-tuning on outside data, and AI agents that fetch and run models without human approval. The owner's job is to map which of those four shapes actually exist in the firm's stack and put proportionate controls around the ones that do.

Key takeaways

- Model poisoning has four practical shapes for an SME: open-weight model supply chain, RAG corpus contamination, adversarial fine-tuning data, and AI agents fetching models autonomously. - The realistic SME risk is not someone targeting your training pipeline, it is the contaminated component you consume without noticing. - Real incidents already exist. JFrog Security found over 100 malicious models live on Hugging Face in 2024, and the PoisonGPT demonstration showed the attack pattern works on a near-identical model name. - Vendor questionnaires and cyber-insurance renewals now ask how you verify third-party AI models. NCSC, DSIT, OWASP and NIST are the standards being cited. - The owner's job is a four-shape audit of the firm's actual stack, then a one-page articulated answer about the proportionate controls in place.

An owner I spoke with last month had two emails open on her laptop. One was a cyber-insurance renewal asking how she verified that third-party AI models were not compromised. The other was a thread with her head of operations, who had cheerfully downloaded an open-weight model from Hugging Face the previous week to run an internal tool. She was not panicked, in her words she was “trying to work out what the right amount of worried looks like.”

That is roughly where many owner-managed firms now sit on model poisoning. The questions have started to arrive, in procurement questionnaires and insurance paperwork, before the language is settled. This post is the plain-English version of what the term means, where the realistic exposure sits for a small business, and the kind of articulated answer that gets you through the next questionnaire.

What is model poisoning?

Model poisoning is when an attacker corrupts something the AI system learns from, so the system produces wrong or hidden malicious output. The corrupted input can be the training data, the model weights as distributed, or the documents the model retrieves at query time. The model is shaped by what it learns from, so if part of that input is controlled by an attacker, the model can be made to behave wrongly.

For a UK SME the term has four practical shapes, and they matter very differently. Classic data poisoning during training is rare because almost no SMEs train their own foundation model. Supply-chain poisoning of open-weight models pulled from public hubs is real. RAG corpus poisoning, where untrusted documents enter the retrieval store, is the most common live exposure. Adversarial fine-tuning data matters only if you fine-tune on outside content. A modern wrinkle is AI agents that download and execute models without a human approval gate.

Why does it happen, and where does it come from?

Poisoning happens because there is now a distribution layer between the people who build models and the firms that use them, and that layer is not fully policed. Open-weight models sit on public hubs. RAG corpora ingest content from the open internet, customer uploads, and third-party feeds. AI agents fetch and execute code with less human oversight. Each pipe is a place corrupted material can travel from a stranger into your stack.

The incidents already on the record are not theoretical. JFrog Security Research published findings in 2024 that over 100 malicious models were live on Hugging Face at the time of the scan, some designed to exfiltrate data, some to inject cryptomining code into systems that loaded them. Mithril Security’s PoisonGPT demonstration in 2023 showed an attacker could fine-tune a legitimate-looking model with poisoned content and host it under a near-identical name. The PyTorch torchtriton dependency-confusion attack in December 2022 is the AI-tooling analogue. Anthropic’s Sleeper Agents research, January 2024, sits behind the academic anchor for hidden objectives that survive standard testing.

Where this actually bites a small business

Where poisoning bites depends on which of the four shapes apply to your stack. A firm running on closed top-tier APIs (OpenAI, Anthropic, Google) has very low base-model exposure because those providers carry the legal risk and run the controls. The moment that firm ingests external documents into a RAG system, or wires up a custom GPT over customer uploads, the corpus exposure becomes its own.

Open-weight models from Hugging Face or similar repositories are the second live exposure. The vast majority of open-weight models are fine. The risk sits with niche or unfamiliar models, near-identical lookalikes of popular ones, and the speed at which a curious team member can move a downloaded model from a personal laptop into a production tool without a check gate. If you fine-tune any model on data you bought, scraped, or accepted from a customer, you have the third exposure. Fine-tuning on outside data is rarer in SMEs than vendors make it sound, but where it exists the exposure is real.

The fourth exposure is the newest. AI agents that fetch and run models autonomously, without a human in the loop at the moment of deployment, turn supply-chain poisoning from a slow human-paced risk into a fast machine-paced one. An agent told to “find the best model for this task and run it” can pull a poisoned model into production faster than anyone can review what just happened.

Where you will meet it in paperwork

You will meet model poisoning in three places long before any operational incident. The first is a customer procurement questionnaire, especially from a larger client with its own security team, in a form like “describe your controls over third-party AI model integrity”. The second is a cyber-insurance renewal where AI-specific schedules now include similar questions. The third is any sector-specific regulator review, where AI governance is increasingly in scope.

The standards layer behind those questions is converging. The NCSC’s Guidelines for secure AI system development, published November 2023 with CISA and 20 partner agencies, treats supply-chain integrity and training-data provenance as foundational. DSIT published the UK AI Cyber Security Code of Practice in 2025, voluntary but increasingly cited in procurement. The OWASP Top 10 for Large Language Model Applications names Training Data Poisoning (LLM03) and Supply Chain Vulnerabilities (LLM05) as critical risks. NIST’s AI 600-1 Generative AI Profile (July 2024) covers data provenance and supply-chain controls in detail. MITRE ATLAS provides the standardised taxonomy of attacks. If you trade into the EU, the AI Act adds supply-chain provenance obligations on high-risk systems.

The owner’s job in this paperwork is to give an articulated, honest answer that shows proportionate controls, not to write a perfect security policy. Customers and insurers are testing whether you have thought about it, not whether you have a Fortune 500 control framework hanging off a 25-person firm.

What to do about it

The proportionate controls map cleanly onto the four exposure shapes, and none of them are exotic. For closed top-tier API use, you rely on the vendor’s published security practice and focus your effort on the corpus side. For open-weight models, the controls are provenance checks (signed weights or file hashes where available, model card review, isolation-test before production) and a preference for popular, well-maintained models over niche ones.

For RAG corpora, you want source allowlists, file-type filtering on anything customer-uploaded, provenance metadata on every ingested document, and a human review gate for externally sourced content of unclear origin. For fine-tuning on outside data, the controls are contractual attestation from the data vendor about how they collect and validate it, plus statistical validation of the dataset before training. For AI agents, the highest-impact control is a human approval gate before any auto-fetched model reaches production. These are normal data-governance practices applied to AI inputs.

What you actually owe yourself this quarter is the four-shape audit and a one-page articulated answer. Walk through your stack and write down which shapes apply, which do not, and what control you have or will add for each. That one page is the credible answer to the next questionnaire, and it is the input to the bigger conversation with your qualified cyber consultant or insurance broker. Treat any vendor claim of “we are immune to poisoning” with the same scepticism as “we are hallucination-free”. This post is the plain-English frame, not a substitute for a real assessment. The forward-links from here are the 12-question vendor due diligence guide and the one-page AI risk register, where the four-shape audit becomes a row you can actually maintain.

Sources

JFrog Security Research (2024). Malicious AI Models, Hundreds of Poisoned ML Models Found on Hugging Face. The 100-plus live malicious models finding, supply-chain anchor. https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/ Mithril Security (2023). PoisonGPT: How We Hacked Hugging Face to Inject Evil into Open-Source AI. Demonstration of the attack pattern using a near-identical model name. https://blog.mithrilsecurity.io/poisongpt-how-we-hacked-hugging-face-to-inject-evil-into-open-source-ai/ NCSC and CISA, with international partners (November 2023). Guidelines for secure AI system development. UK-facing baseline, supply-chain integrity in scope. https://www.ncsc.gov.uk/collection/guidelines-secure-ai-system-development UK Department for Science, Innovation and Technology (2025). AI Cyber Security Code of Practice. Voluntary, increasingly cited in procurement. https://www.gov.uk/government/publications/ai-cyber-security-code-of-practice OWASP Top 10 for Large Language Model Applications (2025 update). LLM03 Training Data Poisoning and LLM05 Supply Chain Vulnerabilities, the reference framework commonly used by security assessors. https://genai.owasp.org/llm-top-10/ NIST (July 2024). AI 600-1, Generative AI Profile under the AI Risk Management Framework. Section on data provenance and supply-chain controls. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf Gu, Dolan-Gavitt, Garg (2017). BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. Academic anchor for hidden-trigger backdoor research. https://arxiv.org/abs/1708.06733 Anthropic (Hubinger et al., January 2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. Case study on hidden malicious objectives surviving standard testing. https://arxiv.org/abs/2401.05566 MITRE ATLAS (Adversarial Threat Landscape for AI Systems). Standardised taxonomy of AI-specific attacks, including poisoning. https://atlas.mitre.org/ Ars Technica (December 2022). PyTorch torchtriton supply-chain compromise. The AI-tooling-supply-chain analogue. https://arstechnica.com/information-technology/2022/12/pytorch-site-targeted-in-novel-supply-chain-attack/

Frequently asked questions

We only use ChatGPT and Microsoft Copilot. Do we still need to worry about poisoning?

Your exposure to base-model poisoning is very low. Top-tier closed-model providers run rigorous controls and carry the reputational and legal risk themselves. Where it gets real for you is the moment you add documents into the picture. If a customer can upload a PDF that ends up in the model's context, or your team feeds external content into a Copilot agent, the corpus side becomes your problem regardless of which underlying model sits behind it.

How do I answer the supplier questionnaire question, "how do you verify third-party AI models are not compromised?"

Start by writing down where models actually come from in your firm. For a typical SME the honest answer is "we use established providers with published security practices, we do not fine-tune on untrusted data, and any open-weight model is tested in isolation before production." That is a credible answer. It is not perfect security, it is proportionate due diligence, which is what the questionnaire is really testing.

My head of operations downloaded a model from Hugging Face for an internal tool. Should I be worried?

Worried is too strong. Aware is the right word. Ask them three things. Which model, who maintains it, and is it running in isolation or against live data. Popular, well-maintained models with thousands of downloads are usually fine. The risk sits with niche models, near-identical-name lookalikes, and any model wired straight into a system that holds customer data without a test gate in front of it.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation