Copyright claims over AI training data: what the lawsuits mean for your business

A business owner reviewing documents at a desk with a laptop open beside them
TL;DR

Copyright lawsuits over AI training data, including the 2024 UK High Court ruling in Getty Images v Stability AI, are largely a vendor-level problem. For an owner-managed business, the practical exposure concentrates in marketing images and customer-facing content that could reproduce protected works or visible watermarks. The right controls are vendor IP indemnities, a human review step before anything goes public, and clear policies on what your team can use commercially.

Key takeaways

- The 2024 UK High Court ruling in Getty v Stability AI found the AI model itself is not an infringing copy, but trade mark liability arose where watermarks appeared in generated outputs. - For owner-managed businesses, copyright risk from AI training data concentrates in visual outputs used for marketing and customer-facing content, not in day-to-day internal productivity tools. - Enterprise AI vendors including Microsoft offer IP indemnities that shift a significant portion of copyright claim risk from the user to the vendor, subject to conditions of use. - EU AI Act obligations now require providers of general-purpose AI models to publish training data summaries, creating a transparency layer you can ask to see before signing anything. - A human review step before AI-generated content goes to clients or is published publicly is the proportionate governance response for any owner-managed business operating at normal scale.

Picture a small design agency spending an afternoon using an AI image generator to produce social media graphics for a client launch. The outputs come back broadly clean. Then one arrives with a faint watermark visible in the corner, the ghost of a Getty Images credit that has somehow come through the generation process. Nobody in the agency is quite sure whether to delete it and try again, or whether something more significant needs a conversation.

That scenario sits at the centre of a live legal debate. The lawsuits are real, the UK High Court has ruled on part of the question, and the practical implications for owner-managed businesses are narrower and more manageable than the headlines tend to suggest.

These claims centre on what AI models were trained on, not what they produce for you. Companies including Getty Images and the New York Times argue that the firms building large AI models scraped millions of their images or articles without permission, then used them to train systems sold commercially. The 2024 UK High Court ruling in Getty Images v Stability AI is the most significant UK test case resolved so far.

In that case, Justice Joanna Smith found that the Stable Diffusion model did not itself constitute an infringing copy under the Copyright, Designs and Patents Act 1988. The reasoning: the model does not store or reproduce the original Getty images in recoverable form. However, the court did find limited trade mark infringement where AI-generated outputs reproduced the Getty watermark visibly. The model cleared one copyright hurdle; certain outputs it produced did not clear the next one.

This distinction matters for anyone following the debate. UK law has not yet decided whether training on copyrighted works within the UK, without a licence, is itself lawful. The UK IPO’s 2021-22 consultation acknowledged the area is “disputed”, and the government stepped back from a proposed broader text-and-data-mining exception after pressure from creative sectors. University of Cambridge researchers have warned that an opt-out approach, where all works can be scraped unless creators proactively object, risks giving “carte blanche” to AI firms at the expense of UK creators.

Why does this matter for your business?

The practical risk for an owner-managed business sits in the outputs, not the training. You didn’t build or train these models. You are deploying outputs from models whose training data is legally contested, and if those outputs closely resemble protected works or reproduce watermarks and logos, a claim can follow. The Getty case found trade mark liability where watermarks appeared in AI-generated images, even as the broader copyright question went the other way.

A 2023 study commissioned by the UK IPO found that 26% of UK creative businesses were “very concerned” about unlicensed use of their works in AI training, with a further 35% “somewhat concerned”. Those numbers signal the direction of future claims. Rights-holders are watching, and active litigation in the US and EU is testing theories of liability that UK courts will eventually have to address.

For an owner-managed business, the exposure concentrates in specific areas. Marketing materials, website imagery, product design, and content created with AI for commercial delivery carry the highest profile. Internal drafting, meeting summaries, and administrative work carry a much lower profile. The question the court asked in Getty, whether the output substantially reproduces protected content, is also the question you should ask before anything AI-generated goes to a client or appears publicly.

Where does this risk actually show up?

The highest-exposure area for owner-managed businesses is generative image tools used for marketing. If you use an image generator to create website banners, social media graphics, or product visuals, the output travels directly to customers, often without anyone checking whether it is substantially similar to a protected work. Code generators carry a related risk when you ship the output as part of a commercial product without review.

Text models carry a lower but still real risk in content-heavy businesses. Asking a language model to reproduce sections of a specific article, or to closely mirror a distinctive copyrighted text, is a direct route to a potential claim. The rule of thumb from UK legal commentary: treat any AI output the same way you would treat work from a freelancer. Read it, own it, check it before it leaves the building.

A separate but related issue arises if your team uses AI tools that fine-tune models on your customer data or on data containing personal information. UK GDPR applies the moment personal data is involved, and the ICO’s guidance on AI and data protection is clear that a lawful basis is required, alongside a data protection impact assessment and clear contractual controls on what the vendor does with that data. For many owner-managed businesses, fine-tuning is not something you are doing directly, but knowing whether your vendor does it on your inputs is worth confirming.

When is the risk real, and when can you set it aside?

For typical day-to-day AI use, the copyright risk is low enough to set aside without much analysis. Drafting emails, summarising documents, generating meeting notes, and brainstorming with a language model, none of these produce the kind of output likely to attract a copyright claim. The risk concentrates where output is visual, distinctive, creative, and sent directly to customers or published publicly.

The most practical filter at the moment is whether your vendor offers an IP indemnity. In September 2023, Microsoft announced a Copilot Copyright Commitment, agreeing to defend enterprise customers against copyright claims arising from Copilot outputs, provided those customers use the tool within intended scenarios and respect content filters. Similar commitments are appearing from other enterprise vendors. This shifts a meaningful portion of the exposure from user to vendor.

Where a vendor offers no indemnity, that does not automatically mean serious risk, but it does mean you carry more of the uncertainty. For owner-managed businesses using general-purpose image generators without enterprise agreements, a human review step before any output goes to a client or appears on your site is the proportionate response. That step catches watermarks, obvious copying, and outputs that look too close to known works.

Sector also matters. If your business creates creative work for clients as the core deliverable, such as a design agency, a training provider, or a content studio, the risk profile is higher. Choosing tools that use licensed datasets, or that offer fine-tuning on your own licensed content, is worth the additional sourcing effort.

Four concepts come up repeatedly in the AI training data debate, and knowing what they mean saves time when a vendor or solicitor uses them. IP indemnity has the most immediate practical weight: it is a vendor’s contractual promise to defend you if a third party claims your use of their AI output infringes copyright or trade mark rights. A growing number of enterprise vendors now offer some version of this.

The text-and-data-mining exception is worth understanding. UK copyright law has a narrow exception allowing data mining of works you have lawful access to for non-commercial research. The government considered expanding this in 2022 to cover commercial AI training and stepped back after pushback from creative sectors. The exception is frequently cited in the debate but offers limited practical shelter for commercial AI use, and none for output-level claims.

EU AI Act transparency obligations add a new layer. From August 2025, providers of general-purpose AI models covered by the Act must publish sufficiently detailed summaries of training data, including whether copyright-protected content was used. Many models used by UK owner-managed businesses fall within scope, which means documentation now exists that you can reasonably ask to see before signing any agreement. The CMA’s initial report on AI foundation models noted that opaque training data practices may also raise consumer protection concerns, giving regulators another angle to work from.

The practical close is simple. If your vendor cannot show you a training data summary, does not offer an IP indemnity, and you are creating customer-facing content with their tool, that is an unmanaged risk. Either change the tool or add a human review step before any output goes anywhere public. That step costs very little at an owner-managed business scale, and it closes the gap the current legal uncertainty leaves open.

Sources

- UK Intellectual Property Office (2022). Copyright and artificial intelligence: government consultation. Confirms UK law on AI training data copyright is "disputed" and explains the decision to step back from a broad text-and-data-mining exception. https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence/copyright-and-artificial-intelligence - University of Cambridge Research (2023). Forcing UK creatives to opt-out of AI training risks stifling new talent. Academic commentary on opt-out risks and the limits of current copyright protection for creators whose work is scraped for training. https://www.cam.ac.uk/research/news/forcing-uk-creatives-to-opt-out-of-ai-training-risks-stifling-new-talent-cambridge-experts-warn - Journal of Intellectual Property Law and Practice (2024). Oxford University Press peer-reviewed analysis of EU AI Act training data transparency obligations and their copyright implications for model providers and downstream users. https://academic.oup.com/jiplp/article/20/3/182/7922541 - European Union (2024). Regulation (EU) 2024/1689, EU Artificial Intelligence Act. Primary legislation establishing training data transparency and copyright obligations for general-purpose AI model providers placing models on the EU market. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689 - Information Commissioner's Office (2023). AI and data protection guidance. ICO guidance on lawful basis requirements, DPIAs, and vendor contracts when personal data is involved in AI training or fine-tuning. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - National Cyber Security Centre (2024). Guidelines for secure AI system development. NCSC guidance on data handling, vendor controls, and human review discipline in AI deployments. https://www.ncsc.gov.uk/collection/guidelines-for-secure-ai-system-development - Competition and Markets Authority (2023). AI foundation models: initial report. CMA analysis of training data concentration, copyright opacity, and consumer protection implications for businesses using foundation models. https://www.gov.uk/government/publications/ai-foundation-models-initial-report/ai-foundation-models-initial-report - JD Supra (2024). UK court draws a narrow line on AI copyright: analysis of Getty Images v Stability AI. Legal commentary distinguishing model-level copyright from output-level trade mark liability in the 2024 High Court ruling. https://www.jdsupra.com/legalnews/uk-court-draws-a-narrow-line-on-ai-7600328/ - Microsoft (2023). Microsoft's new Copilot Copyright Commitment. Vendor announcement of IP indemnity programme for enterprise Copilot customers, including the conditions and scope of the commitment. https://blogs.microsoft.com/blog/2023/09/07/microsofts-new-copilot-copyright-commitment/ - Waterfront Solicitors (2023). Navigating AI and copyright: what every UK business needs to know. UK legal guidance on downstream liability for AI outputs, human review discipline, and vendor contract clauses for owner-managed businesses. https://waterfront.law/navigating-ai-and-copyright-what-every-uk-business-needs-to-know/

Frequently asked questions

Can my business be sued because an AI model was trained on copyrighted material?

Probably not for the training itself, since you didn't train anything. The risk is more specific: if an AI output you deploy closely resembles a protected work, reproduces a watermark, or copies distinctive content, a trade mark or copyright claim can follow. The Getty v Stability AI ruling showed watermarks in AI outputs created trade mark liability, even when the model itself was cleared on copyright.

Does Microsoft's Copilot Copyright Commitment actually protect me?

It can, subject to conditions. Microsoft's 2023 commitment covers enterprise customers who use Copilot within intended scenarios and respect content filters. If you have an enterprise agreement and stay within those terms, Microsoft has agreed to defend you against copyright claims arising from your use of Copilot outputs. The commitment does not apply to free-tier or consumer accounts, and it requires you not to reproduce specific copyrighted works deliberately.

What does the EU AI Act change for UK businesses using AI tools?

From August 2025, providers of general-purpose AI models covered by the EU AI Act must publish summaries of training data and demonstrate copyright compliance. Many models used by UK businesses are placed on the EU market and fall within scope. This means documentation now exists that you can ask to see. It does not create direct compliance obligations for UK business users, but it changes what you can reasonably demand from your vendors.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation