AI training data copyright: what SME owners need to know

May 11, 2026

TL;DR

The AI copyright debate is mostly a fight between large rights holders and large AI labs about how the models were trained. Very little of it changes what a 5 to 50 person firm does on Monday morning. Vendor indemnities cover the model and the output, not what you paste in. The durable rule is to use AI tools only on data your business owns or has a clear documented right to use.

Key takeaways

- The headline copyright cases (New York Times v OpenAI, Bartz v Anthropic, Getty v Stability) are mostly about how the model was trained, not about whether you can use the tool. Two of the three central rulings so far have called training on lawfully acquired material fair use. - The UK government confirmed in March 2026 that it will not introduce a broad copyright exception for AI training. Existing copyright law applies. Section 29A of the CDPA permits text and data mining for non-commercial research only, which means commercial AI training in the UK still needs licensing or a fair-dealing argument. - The major vendor indemnities (OpenAI, Microsoft, Anthropic, Google, Adobe) cover their training data and, on enterprise plans, the output the model produces. None of them cover what you paste in. Customer input is your responsibility. - The one durable rule for owner-led firms is simple. Use AI tools only on data and content your business owns outright or has a clear documented right to use. That rule survives whichever way the courts and regulators move. - Three things to tighten this quarter: a one-line clause in client contracts about AI use, a short internal rule on what never gets pasted into a free-tier tool, and confirmation that any third-party content you put through AI was yours to use in the first place.

The owner of a 22-person services firm has read four news stories about AI and copyright in the last month. One said authors won. One said the AI companies won. One said the UK government had stepped back from a copyright exception. One said the EU was tightening transparency rules. She has not formed a useful opinion and is starting to worry she should have one. Her firm uses ChatGPT and Claude every day. Three of her staff have just asked whether they are still allowed to.

This post is for her. The headline copyright fights are loud, complicated, and mostly happening between large rights holders and large AI labs about how the underlying models were built. Very little of that changes what a 5 to 50 person firm should do on Monday morning. The bits that genuinely apply are narrower and worth knowing precisely, so the owner can give her team a clear answer and get on with the work.

What is the AI copyright debate actually about?

The debate is about whether AI labs needed permission to train their models on copyrighted books, articles, images, lyrics, and code scraped or downloaded from the internet. The New York Times is suing OpenAI on that basis. Authors sued Anthropic. Music publishers sued Anthropic. Getty Images sued Stability AI in the UK. The labs argue training is protected fair use. The rights holders argue the labs took commercial value without paying for it.

The cases so far have produced a split picture. In June 2025, Judge Alsup ruled in Bartz v Anthropic that training on lawfully acquired books was fair use, while keeping pirated copies was not. Anthropic settled the piracy side for $1.5 billion. Judge Chhabria reached the same fair-use conclusion in Kadrey v Meta two days later. In November 2025, the UK High Court ruled in Getty v Stability that the final Stable Diffusion model does not contain copies of the training images and so is not a secondary-infringement article. The Thomson Reuters v ROSS Intelligence decision, the outlier, rejected fair use because ROSS was building a direct competitor to the rights holder’s product.

Why does it matter for your business?

It matters less than the noise suggests, in a precise way. None of the active cases is about whether an end user can use a commercially available AI tool. They are about how the model was built. No court anywhere has held a small business liable for using ChatGPT, Claude or Copilot in its normal work, and the major vendors carry enterprise indemnities for output-side claims.

Where it does matter is on the other side of the workflow. The vendor indemnities from OpenAI, Microsoft, Anthropic, Google and Adobe cover claims that the model’s training data or output infringes a third party’s intellectual property right. They do not cover what you paste in. If your firm puts a client’s confidential document into a free-tier consumer AI tool that uses customer content for model improvement, the vendor’s training-data indemnity does nothing for you. The exposure is your contract with the client and the IP rights of whoever owns the material. That distinction, vendor covers their side, you cover your input, is the load-bearing one for an SME using AI tools every day.

Where will you actually meet it?

You will meet it in four places in the firm. AI-assisted client deliverables, where the question is whether the output resembles existing published work too closely. Third-party content put through AI, where the question is whether you had the right to use the material. Confidential client data pasted into consumer AI tools, where the vendor’s terms may breach your duty of confidence. And contractual disclosure of AI use to clients.

The first two are output-side. The third is input-side and is the most common live exposure in owner-led firms today. The fourth is the cheapest to fix and the most often left undone. A single sentence in the schedule of a services contract usually does the job. Read across to where your data goes when you paste it into a chatbot for the data-flow side, and to who owns the work when AI wrote it for the ownership question, which sits next to copyright but is a distinct issue.

When to ask versus when to ignore?

Ask when AI is touching client confidential information, regulated material, third-party content you do not own, or work the firm warrants as original. In those situations the answer is to use a paid tier with terms that prohibit training on customer content, get the client’s contractual consent, and have a human review the output before delivery. Ignore the temptation to track every news story and tune the firm’s practice to it.

The single durable rule, which holds whichever way the law moves in the US, UK or EU, is this. Use AI tools only on data and content your business owns outright or has a clear documented right to use. Owned client data goes through a paid tier with the right terms. Third-party content goes through AI only if you have permission to use it. Generated output gets reviewed by a human before delivery. That rule sits underneath every published case so far and does not depend on which way the next one goes. It is also short enough to brief the team on in a five-minute conversation.

This sits inside the IP, ownership and disclosure section of a wider cluster on AI risk, trust and governance for owner-led firms. The neighbouring posts cover the questions that sit next to copyright without being the same question. Ownership of AI-assisted deliverables is a separate matter from whether the underlying model was trained legally, and is worth reading alongside this one.

For the ownership question, see who owns the work when AI wrote it. For the contractual disclosure question, disclosing AI use to customers. For the input-side exposure that vendor indemnity does not solve, where your data goes when you paste into a chatbot. For the wider regulatory picture, the EU AI Act for UK and EU SMEs covers the transparency obligations on general-purpose AI providers, and UK AI regulation after the pro-innovation pivot covers the UK position after the March 2026 report. None of these posts is legal advice. Each one is the proportionate version of a question owner-led firms keep being asked, sized for a firm the owner can see across in a single room.

If you have read four contradictory news stories about AI and copyright and want twenty minutes to talk through what actually applies to your firm, book a conversation.

Sources

- White & Case (2025). Two California District Judges Rule that Using Books to Train AI is Fair Use, the legal summary of Bartz v Anthropic and Kadrey v Meta with the fair-use reasoning. https://www.whitecase.com/insight-alert/two-california-district-judges-rule-using-books-train-ai-fair-use - Wolters Kluwer (2025). The Bartz v Anthropic Settlement: Understanding America's Largest Copyright Settlement, on the $1.5 billion settlement and the fair-use-for-training versus piracy-for-acquisition split. https://legalblogs.wolterskluwer.com/copyright-blog/the-bartz-v-anthropic-settlement-understanding-americas-largest-copyright-settlement/ - Linklaters (2025). Getty Images v Stability AI: English High Court Rejects Secondary Copyright Claim, the November 2025 UK High Court judgment that the final Stable Diffusion model is not an infringing copy. https://www.lw.com/en/insights/getty-images-v-stability-ai-english-high-court-rejects-secondary-copyright-claim - UK Government (2026). Report on Copyright and Artificial Intelligence, the March 2026 confirmation that the UK will not introduce a broad copyright exception for AI training. https://www.gov.uk/government/publications/report-and-impact-assessment-on-copyright-and-artificial-intelligence/report-on-copyright-and-artificial-intelligence - UK Government (2024). Copyright and Artificial Intelligence Consultation, the original consultation that received 11,520 responses and produced the March 2026 report. https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence/copyright-and-artificial-intelligence - Reed Smith (2025). Court shuts down AI fair use argument in Thomson Reuters v ROSS Intelligence, the Delaware decision rejecting fair use where the AI directly targets a vendor's competitive market. https://www.reedsmith.com/articles/court-ai-fair-use-thomson-reuters-enterprise-gmbh-ross-intelligence/ - European Commission (2025). Template for general-purpose AI model providers to summarise data used to train their models, the EU AI Act Article 53 transparency obligation in force from August 2025. https://digital-strategy.ec.europa.eu/en/news/commission-presents-template-general-purpose-ai-model-providers-summarise-data-used-train-their - Information Commissioner's Office. Guidance on AI and data protection, the UK reference for what happens when AI processing involves personal data, separate from copyright but adjacent for SMEs. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - Anthropic (2023). Expanded legal protections and improvements to our API, the published commercial indemnity for Claude API and enterprise customers. https://www.anthropic.com/news/expanded-legal-protections-api-improvements - Google Cloud (2024). Protecting customers with generative AI indemnification, the two-pronged training-data and generated-output indemnity that several other vendors now broadly mirror. https://cloud.google.com/blog/products/ai-machine-learning/protecting-customers-with-generative-ai-indemnification

Frequently asked questions

Am I in legal trouble for using ChatGPT to draft a client report?

Almost certainly not, on the training-data question. Courts in the US have so far called training on lawfully acquired material fair use, the UK Getty v Stability ruling found the final Stable Diffusion model does not contain copies of the training images, and no court anywhere has held an end user liable for using a commercial AI tool that was trained on copyrighted material. The exposure is on the other side of the workflow, in what you paste in. If you pasted a third party's confidential document or copyrighted material into the tool without permission, that is your contract and your IP question, not the AI lab's.

Does the indemnity in my AI vendor's terms cover me if a client sues over an AI-drafted deliverable?

It depends what they sue over. OpenAI, Microsoft, Anthropic, Google and Adobe all offer enterprise indemnities that cover claims the model's training data or output infringes a third party's intellectual property right. They do not cover what you put in. So if the client claim is that the AI-drafted text resembles an existing published work, the vendor indemnity is in play. If the claim is that you breached confidentiality by pasting the client's own data into a consumer AI tool, you are on your own. Read the terms for the specific plan you are on, not the marketing summary.

Should I disclose AI use to clients in my services contract?

Yes, in a short clause that says you may use AI tools to assist delivery, that you will not put client confidential information through AI without consent, and that AI-assisted output is human-reviewed before it goes out. That clause does three things at once. It protects you contractually, it removes the awkward conversation if the client later asks, and it gives you a clean answer when a regulator or a future client's procurement form asks how you handle AI in delivery. It does not need to be long. A paragraph in the schedule is enough.

Written by Dr Dave Heath, AI consultant and business strategist.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Copyright, training data, and what business owners actually need to know

Key takeaways

What is the AI copyright debate actually about?

Why does it matter for your business?

Where will you actually meet it?

When to ask versus when to ignore?

Sources

Frequently asked questions

Am I in legal trouble for using ChatGPT to draft a client report?

Does the indemnity in my AI vendor's terms cover me if a client sues over an AI-drafted deliverable?

Should I disclose AI use to clients in my services contract?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Copyright, training data, and what business owners actually need to know

Key takeaways

What is the AI copyright debate actually about?

Why does it matter for your business?

Where will you actually meet it?

When to ask versus when to ignore?

Related concepts

Sources

Frequently asked questions

Am I in legal trouble for using ChatGPT to draft a client report?

Does the indemnity in my AI vendor's terms cover me if a client sues over an AI-drafted deliverable?

Should I disclose AI use to clients in my services contract?

Ready to talk it through?

Related reading

The audit trail an SME actually needs, and the one it does not

Your minimum viable AI policy as a small business

Serving customers across borders, the multi-jurisdiction AI compliance picture for SMEs

If any of this sounds familiar, let's talk.