What is model poisoning? Plain-English guide for owners

An owner I spoke with last month had two emails open on her laptop. One was a cyber-insurance renewal asking how she verified that third-party AI models were not compromised. The other was a thread with her head of operations, who had cheerfully downloaded an open-weight model from Hugging Face the previous week to run an internal tool. She was not panicked, in her words she was “trying to work out what the right amount of worried looks like.”

That is roughly where many owner-managed firms now sit on model poisoning. The questions have started to arrive, in procurement questionnaires and insurance paperwork, before the language is settled. This post is the plain-English version of what the term means, where the realistic exposure sits for a small business, and the kind of articulated answer that gets you through the next questionnaire.

What is model poisoning?

Model poisoning is when an attacker corrupts something the AI system learns from, so the system produces wrong or hidden malicious output. The corrupted input can be the training data, the model weights as distributed, or the documents the model retrieves at query time. The model is shaped by what it learns from, so if part of that input is controlled by an attacker, the model can be made to behave wrongly.

For a UK SME the term has four practical shapes, and they matter very differently. Classic data poisoning during training is rare because almost no SMEs train their own foundation model. Supply-chain poisoning of open-weight models pulled from public hubs is real. RAG corpus poisoning, where untrusted documents enter the retrieval store, is the most common live exposure. Adversarial fine-tuning data matters only if you fine-tune on outside content. A modern wrinkle is AI agents that download and execute models without a human approval gate.

Why does it happen, and where does it come from?

Poisoning happens because there is now a distribution layer between the people who build models and the firms that use them, and that layer is not fully policed. Open-weight models sit on public hubs. RAG corpora ingest content from the open internet, customer uploads, and third-party feeds. AI agents fetch and execute code with less human oversight. Each pipe is a place corrupted material can travel from a stranger into your stack.

The incidents already on the record are not theoretical. JFrog Security Research published findings in 2024 that over 100 malicious models were live on Hugging Face at the time of the scan, some designed to exfiltrate data, some to inject cryptomining code into systems that loaded them. Mithril Security’s PoisonGPT demonstration in 2023 showed an attacker could fine-tune a legitimate-looking model with poisoned content and host it under a near-identical name. The PyTorch torchtriton dependency-confusion attack in December 2022 is the AI-tooling analogue. Anthropic’s Sleeper Agents research, January 2024, sits behind the academic anchor for hidden objectives that survive standard testing.

Where this actually bites a small business

Where poisoning bites depends on which of the four shapes apply to your stack. A firm running on closed top-tier APIs (OpenAI, Anthropic, Google) has very low base-model exposure because those providers carry the legal risk and run the controls. The moment that firm ingests external documents into a RAG system, or wires up a custom GPT over customer uploads, the corpus exposure becomes its own.

Open-weight models from Hugging Face or similar repositories are the second live exposure. The vast majority of open-weight models are fine. The risk sits with niche or unfamiliar models, near-identical lookalikes of popular ones, and the speed at which a curious team member can move a downloaded model from a personal laptop into a production tool without a check gate. If you fine-tune any model on data you bought, scraped, or accepted from a customer, you have the third exposure. Fine-tuning on outside data is rarer in SMEs than vendors make it sound, but where it exists the exposure is real.

The fourth exposure is the newest. AI agents that fetch and run models autonomously, without a human in the loop at the moment of deployment, turn supply-chain poisoning from a slow human-paced risk into a fast machine-paced one. An agent told to “find the best model for this task and run it” can pull a poisoned model into production faster than anyone can review what just happened.

Where you will meet it in paperwork

You will meet model poisoning in three places long before any operational incident. The first is a customer procurement questionnaire, especially from a larger client with its own security team, in a form like “describe your controls over third-party AI model integrity”. The second is a cyber-insurance renewal where AI-specific schedules now include similar questions. The third is any sector-specific regulator review, where AI governance is increasingly in scope.

The standards layer behind those questions is converging. The NCSC’s Guidelines for secure AI system development, published November 2023 with CISA and 20 partner agencies, treats supply-chain integrity and training-data provenance as foundational. DSIT published the UK AI Cyber Security Code of Practice in 2025, voluntary but increasingly cited in procurement. The OWASP Top 10 for Large Language Model Applications names Training Data Poisoning (LLM03) and Supply Chain Vulnerabilities (LLM05) as critical risks. NIST’s AI 600-1 Generative AI Profile (July 2024) covers data provenance and supply-chain controls in detail. MITRE ATLAS provides the standardised taxonomy of attacks. If you trade into the EU, the AI Act adds supply-chain provenance obligations on high-risk systems.

The owner’s job in this paperwork is to give an articulated, honest answer that shows proportionate controls, not to write a perfect security policy. Customers and insurers are testing whether you have thought about it, not whether you have a Fortune 500 control framework hanging off a 25-person firm.

What to do about it

The proportionate controls map cleanly onto the four exposure shapes, and none of them are exotic. For closed top-tier API use, you rely on the vendor’s published security practice and focus your effort on the corpus side. For open-weight models, the controls are provenance checks (signed weights or file hashes where available, model card review, isolation-test before production) and a preference for popular, well-maintained models over niche ones.

For RAG corpora, you want source allowlists, file-type filtering on anything customer-uploaded, provenance metadata on every ingested document, and a human review gate for externally sourced content of unclear origin. For fine-tuning on outside data, the controls are contractual attestation from the data vendor about how they collect and validate it, plus statistical validation of the dataset before training. For AI agents, the highest-impact control is a human approval gate before any auto-fetched model reaches production. These are normal data-governance practices applied to AI inputs.

What you actually owe yourself this quarter is the four-shape audit and a one-page articulated answer. Walk through your stack and write down which shapes apply, which do not, and what control you have or will add for each. That one page is the credible answer to the next questionnaire, and it is the input to the bigger conversation with your qualified cyber consultant or insurance broker. Treat any vendor claim of “we are immune to poisoning” with the same scepticism as “we are hallucination-free”. This post is the plain-English frame, not a substitute for a real assessment. The forward-links from here are the 12-question vendor due diligence guide and the one-page AI risk register, where the four-shape audit becomes a row you can actually maintain.

What is model poisoning? Why it matters for your business

Key takeaways

What is model poisoning?

Why does it happen, and where does it come from?

Where this actually bites a small business

Where you will meet it in paperwork

What to do about it

Sources

Frequently asked questions

We only use ChatGPT and Microsoft Copilot. Do we still need to worry about poisoning?

How do I answer the supplier questionnaire question, "how do you verify third-party AI models are not compromised?"

My head of operations downloaded a model from Hugging Face for an internal tool. Should I be worried?

Ready to talk it through?

If any of this sounds familiar, let's talk.

What is model poisoning? Why it matters for your business

Key takeaways

What is model poisoning?

Why does it happen, and where does it come from?

Where this actually bites a small business

Where you will meet it in paperwork

What to do about it

Sources

Frequently asked questions

We only use ChatGPT and Microsoft Copilot. Do we still need to worry about poisoning?

How do I answer the supplier questionnaire question, "how do you verify third-party AI models are not compromised?"

My head of operations downloaded a model from Hugging Face for an internal tool. Should I be worried?

Ready to talk it through?

Related reading

Zero-shot vs few-shot learning: when AI works on tiny data

What is AutoML? Why it matters for your business

What is edge AI? Why running AI locally matters for your business

If any of this sounds familiar, let's talk.