Copyright risk when AI is trained on protected material

A founder at a desk with a printed contract in one hand and a pen in the other, a laptop open beside them
TL;DR

Copyright risk from AI training operates on two levels. The model training level is the subject of active litigation, with the UK High Court finding in December 2024 that training on images did not create infringing copies, while US courts remain divided. The more immediate risk for owner-managed businesses sits on the output side and in vendor contracts, where indemnity terms vary significantly and client IP warranties are becoming standard procurement requirements.

Key takeaways

- Training risk and output risk are two distinct categories. UK law currently places the training-side liability mostly on the model developer, not the end user buying an off-the-shelf SaaS tool. - The UK High Court ruled in Getty Images v Stability AI (December 2024) that training on copyrighted images did not make infringing copies, but found trade mark infringement where AI outputs reproduced Getty and iStock watermarks. - Section 29A of the Copyright, Designs and Patents Act 1988 permits text and data mining for non-commercial research only. Commercial AI training in the UK still requires licensing or a defensible fair-dealing argument. - Major AI providers now offer IP indemnity programmes for enterprise users covering training data and model outputs. These indemnities do not cover what the customer puts in. Read the terms for the specific plan your firm is on. - The practical steps: confirm your vendor's IP indemnity, avoid uploading third-party licensed or client content for fine-tuning without documented permission, and require human review of AI output before it reaches clients.

A client sends back a services agreement with a new clause highlighted. They want a warranty that any AI-generated content in the deliverables was produced using tools trained only on licensed material. You read it twice. You use two or three AI tools regularly. You have never checked what they were trained on, and until this moment you have not needed to. This post explains what the risk actually is, so you can decide where to focus.

The risk operates on two levels. First, the training level: an AI model is built by processing vast volumes of content, often scraped from the web. If that content included copyrighted material taken without authorisation, the rights holders can argue infringement. Second, the output level: the model’s generated text or images may reproduce protected material closely enough to infringe copyright or trade marks.

In December 2024, the UK High Court ruled in Getty Images v Stability AI that training the Stable Diffusion image model on copyrighted photographs did not, on the evidence before it, amount to making infringing copies under UK copyright law. The court accepted expert evidence that the trained model learns statistics and does not store copies of the original images. That finding reduces the training-side liability risk for UK-based model developers.

But the case had a twist. Getty succeeded on limited trade mark infringement because AI-generated images reproduced the Getty Images and iStock watermarks. The training itself was not the infringement; the output was. That distinction matters because it means the legal exposure for a business using an AI image tool can sit with what comes out of the model, not how the model was built.

UK copyright law also has a narrow text and data mining exception under Section 29A of the Copyright, Designs and Patents Act 1988, but it covers non-commercial research only. Commercial AI training in the UK still requires licensing or a defensible fair-dealing argument. The UK government confirmed in 2024 that it will not extend this exception to commercial use.

Why does this matter for your business?

If you buy and use mainstream AI tools as a service, you sit a step removed from whoever trained the model. That does not make you entirely clear. Many vendors’ standard contracts disclaim responsibility for IP infringement in outputs and pass that liability to the customer. If your firm has ever fine-tuned a model on third-party content, you step directly into the training-side risk.

Some major providers have responded with explicit IP indemnity programmes. Microsoft’s Copilot Copyright Commitment, published in 2023, promises to defend and compensate enterprise customers facing copyright claims arising from use of its Copilots, provided they use Microsoft’s built-in filters and safety systems. Similar commitments exist from other large vendors. These indemnities cover the training data and the model’s output; they do not cover what you put in.

The EU AI Act, formally adopted in 2024, adds a further layer for firms selling into or operating across EU markets. General-purpose AI model providers must now publish a sufficiently detailed summary of the content used to train their models, specifically to give copyright holders a route to enforce their rights. If your firm uses a large foundation model deployed on the EU market, you can expect more disclosure about its training provenance, and with it more scrutiny from rights holders.

Where will you actually meet it?

For a services firm, the three most likely encounter points are client contracts, image and design work, and decisions about fine-tuning on specialised content. Client procurement teams at larger organisations now routinely ask for IP warranties on AI-assisted deliverables. The image and design encounter carries the highest tested litigation risk, given the Getty v Stability AI case.

Generating commercial images with an AI tool, whether for marketing materials, client presentations, or website visuals, sits in the most litigated territory. AI image models have been trained on enormous corpora of web images, many of which were copyrighted. The Getty case showed what courts may and may not hold: training was not found infringing under UK law, but outputs that reproduced identifiable trade mark elements were.

Fine-tuning is the second high-risk scenario. If your firm, or a developer working for you, uploads client documents, licensed database extracts, or third-party content to train or customise a model, you have moved from consumer to producer. That triggers copyright exposure on the training side and, where that content includes personal data, a UK GDPR issue under the ICO’s AI guidance framework.

The third encounter is in proposals, reports, and client deliverables, particularly in marketing, legal, and professional services work. The risk is low if the tool’s terms include an output indemnity and your staff review AI output before it goes out. The risk rises when using a tool with no indemnity and distributing AI text without any editorial review.

When should you ask harder questions, and when can you move on?

For off-the-shelf SaaS tools from large providers, the training-data copyright risk belongs to the vendor. Your practical exposure sits on the output side, and reputable enterprise plans cover that with indemnity commitments. Ask harder questions when fine-tuning on third-party content, when your work involves commercial image or logo generation, or when a client contract includes explicit IP warranties you are being asked to provide.

Three questions worth asking any AI vendor: Does the platform provide an IP indemnity covering copyright claims on outputs, and on what conditions? Does using the service constitute consent to use your inputs for further training? And what happens to any fine-tuned model if the vendor relationship ends?

The last question matters for anyone who has customised a tool on proprietary or client data. Some platforms train on user prompts and inputs, particularly on free or basic tiers. If those inputs include client content, the situation intersects with your confidentiality obligations.

The low-risk baseline looks like this: reputable SaaS tool, enterprise or paid tier with a documented indemnity, human review before AI output goes to a client, and no upload of licensed database content or third-party material for training. US law is still unsettled on fair use for AI training, and the UK is in an active consultation period. That does not mean holding back; it means knowing which side of these distinctions you currently sit on.

What else connects to this risk?

Copyright risk from AI training sits alongside three issues owners encounter in the same conversation. Data protection engages when training involves personal data, which overlaps with copyright where client documents contain both. Trade mark law applies separately, as the Getty case showed when AI outputs reproduced brand watermarks. Insurance is now asking about AI IP exposure on proposal forms, and many firms cannot yet answer.

On data protection: the ICO is clear that training AI on personal data requires a lawful basis, proper data minimisation, and a Data Protection Impact Assessment for high-risk uses. If you are fine-tuning a model on client files, HR records, or customer communications, you have a copyright question and a UK GDPR question on your hands simultaneously.

On trade marks: the Getty v Stability AI ruling confirmed that even where training is held not to infringe copyright, outputs reproducing trade marks remain actionable. For a services firm generating marketing materials or client-facing designs with AI, reviewing output for brand identifiers before publishing is basic hygiene.

On insurance: UK legal commentary identifies IP exposure from generative AI as an emerging claims category. Insurers are beginning to include AI-related IP questions on proposal forms. A firm that cannot describe its AI IP controls clearly may face gaps in existing technology errors and omissions or cyber cover. That is worth checking before the next renewal.

The regulatory picture is still developing. The UK IPO has an active code of practice consultation on AI and copyright. The EU AI Act’s transparency requirements for general-purpose AI models have been in force since August 2025. The WIPO has running international consultations on AI and IP norms. The direction of travel is towards more disclosure, not less.

The practical discipline is not to wait for the law to settle. Know which data goes through which tools, have IP terms in your vendor and client contracts that reflect your actual practice, and require human review of AI output before anything reaches a client.

Sources

- UK IPO (2025). Government response to the consultation on AI and intellectual property. Confirms the scope of the Section 29A TDM exception and the government's position on not extending it to commercial AI training. https://www.gov.uk/government/publications/government-response-to-the-consultation-on-artificial-intelligence-and-intellectual-property - UK Government (2024). Government sets out plans on the future of UK copyright and AI. The 2024 consultation framework and the existing legal position for AI developers and users. https://www.gov.uk/government/news/government-sets-out-plans-on-future-of-uk-copyright-and-ai - Journal of Intellectual Property Law and Practice (2024). EU AI Act and copyright obligations. Analysis of the transparency requirements on general-purpose AI model providers under Article 53, enabling rights holders to enforce claims. https://academic.oup.com/jiplp/article/20/3/182/7922541 - Information Commissioner's Office (2024). Guidance on AI and data protection. Covers the UK GDPR obligations arising when AI systems are trained on or process personal data, adjacent to copyright risk where client documents contain both. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - IT Brief (2024). UK court rules AI training on copyrighted works is not infringing. Summary of the Getty Images v Stability AI UK High Court ruling in December 2024 and the watermark trade mark finding. https://itbrief.co.uk/story/uk-court-rules-ai-training-on-copyrighted-works-is-not-infringing - Stradling Law (2024). Training AI models: what you need to know about copyright risks. US case law review including Thomson Reuters v Ross Intelligence and the split in fair use analysis across district courts. https://www.stradlinglaw.com/news-insights/training-ai-models-heres-what-you-need-to-know-about-copyright-risks.html - Microsoft (2023). Our Copilot Copyright Commitment. The named indemnity structure for enterprise customers using Microsoft Copilot, covering training data and outputs provided built-in safety filters are used. https://blogs.microsoft.com/blog/2023/09/07/our-copilot-copyright-commitment/ - CMA (2024). AI Foundation Models update paper. The Competition and Markets Authority's assessment of training data access as a key competitive factor in foundation model markets. https://www.gov.uk/government/publications/ai-foundation-models-initial-review/cmas-review-of-foundation-models-update-paper - WIPO (2024). WIPO consultations on AI and intellectual property. The World Intellectual Property Organization's active international consultation on AI and IP norms, confirming this is a live and evolving regulatory area. https://www.wipo.int/about-ip/en/artificial_intelligence/

Frequently asked questions

Does using ChatGPT or similar tools put my firm at legal risk for copyright infringement?

For standard SaaS use, the training-side risk sits with the model developer. Your exposure is primarily on the output side: AI-generated text or images that closely reproduce a third party's protected work. Enterprise plans from major providers include IP indemnities covering outputs. Human review before output reaches clients is the single most effective control you can apply.

What is the UK text and data mining exception, and does it apply to commercial AI use?

Section 29A of the Copyright, Designs and Patents Act 1988 permits text and data mining for non-commercial research, by lawful users with access to the material, provided they acknowledge the source where practical. It does not cover commercial AI model training for profit, and it can be overridden by contract terms restricting copying on licensed databases. The UK government confirmed in 2024 it will not extend this exception to commercial training.

A client contract now requires a warranty that our AI tools were trained on licensed material. How do we respond?

This clause is becoming more common as procurement teams catch up with the AI copyright debate. Check your vendor's IP indemnity: if it covers training data and outputs on your plan, you have a documented basis to confirm the tools meet that standard. If not, the options are to switch to a vendor that does, to narrow the warranty to what you can genuinely confirm, or to disclose that AI-assisted work is human-reviewed before delivery.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation