Data and IP clauses in AI contracts, in plain English

A business owner at a desk reading a printed contract with a pen in hand, laptop to one side.
TL;DR

AI vendor contracts differ from traditional SaaS in two specific places, data use rights and IP ownership. The data section should answer three questions, can the vendor train on what you put in, how long is it retained, and who else can access it. The IP section answers three more, who owns inputs, who owns outputs, and what indemnity covers infringing outputs. Read both with those six questions in front of you.

Key takeaways

- AI contracts diverge from traditional SaaS in two places, data use rights and IP ownership. Everything else is usually familiar. - On data, get a contractual answer to three questions, training rights, retention, and sub-processor access. Vendor default policies will not protect you on their own. - On IP, get a contractual answer to three more, who owns inputs, who owns outputs, and what indemnity applies if an output infringes a third party. Indemnity scope and liability caps matter as much as the headline promise. - Enterprise tiers from major vendors offer materially stronger protection than standard commercial terms. The split is real, and it is contractual rather than technical. - Escalate to a commercial solicitor when data rights are unusually broad, when the IP indemnity is absent or capped below the engagement value, or when the language uses terms you genuinely cannot parse.

An owner sat at her kitchen table last week with a vendor contract printed out and a pen in her hand. She had read the data-rights section three times. She vaguely sensed it gave the vendor more latitude than she was comfortable with, but could not point to the line that made her uneasy. The IP section talked about derivative works and she was not sure she had followed it.

That is the right instinct. Almost everything in an AI vendor contract is familiar SaaS language. The parts that are genuinely different are narrow, and they sit in two places. Once you know what those two places are and what to look for inside each one, the contract stops feeling impenetrable and starts feeling like a document with six questions you can answer or six questions you cannot.

This is the reading discipline, not a legal opinion. The cluster sibling posts who owns the work when AI wrote it and copyright and training data, what business owners need to know cover the underlying law. This piece is the layer on top, how that law shows up in the contract you are being asked to sign.

Why these two clause categories matter more than the rest

AI contracts differ from traditional SaaS in two places, data use rights and intellectual property ownership. Payment terms, service levels, termination notice, governing law, indemnity for security breach, all look much like the SaaS contracts you have signed before. The two new places are where vendors have written language to handle what SaaS never had to address, what the vendor can do with what you feed in, and who owns what comes out.

The commercial reason is straightforward. With traditional SaaS, your data sat in the vendor’s database. With AI, your data potentially trains the model. A vendor that quietly trained future versions on your client correspondence or pricing rationale would be building a service that competes with you, paid for by your inputs. The data-rights clause is where the contract either closes that door or leaves it open. The IP clause is the equal-and-opposite question on what the AI produces.

What three questions should the data section answer?

The data-rights section should give you a clear answer to three questions. First, can the vendor use what you put in to train or improve its models. Second, how long does the vendor retain your data, and what happens to it when you stop paying. Third, who else can access your data through sub-processors, affiliates or service providers the vendor itself uses.

On training rights, the landscape has split. Enterprise tiers from OpenAI, Anthropic and Google now explicitly commit not to train on customer content without consent. Standard commercial and consumer terms from the same vendors often retain broader rights, sometimes phrased as use for “service improvement”, which is wider than training alone. If your contract is silent or hedged, the vendor’s default policy applies, and the default is usually permissive. Anthropic and Google both reserve the right to extract “learnings” from your data even when they commit not to retain the data itself.

On retention, the OpenAI litigation with the New York Times in 2025 demonstrated something material. A court-ordered preservation hold can override the deletion clause in your contract. Many vendors now write that possibility into their terms explicitly. On sub-processors, GDPR Article 28 requires the vendor to list who else processes your data and to give you notice before that list changes. Narrow the vendor’s discretion in negotiation if the data you are putting in is sensitive.

What three questions should the IP section answer?

The IP section should answer three questions with the same clarity. Who owns the inputs you provide. Who owns the outputs the AI produces. What indemnity does the vendor offer if a third party claims one of those outputs infringes their rights. Many owners read past the first two and stop at the third without realising the first two set up what the third covers.

Input ownership is usually retained by the customer, but ownership does not prevent vendor use. OpenAI’s standard terms let the vendor use inputs for training and improvement even though you technically own them. Anthropic and Google extract learnings without retaining inputs, which sits in a contractual grey zone. Read the input clause alongside the data-rights clause, the two together tell the actual story.

Output ownership varies more than owners expect. OpenAI and Anthropic assign output ownership to the customer in their commercial terms. Some enterprise SaaS vendors only grant a license to use outputs. A handful of contracts are simply silent. Where the contract is silent, you do not own the output by default. The indemnity question is where contracts get cute. Vendors narrow indemnity in three ways, by limiting it to outputs you have not modified, by capping liability at twelve months of fees paid, or by carving out features built on third-party models. Adobe’s Firefly indemnity is the cleanest illustration, feature-specific and excluding non-Adobe trained components. Read what is excluded, not just what is included.

What does a healthy contract look like in practice?

A healthy AI contract answers all six questions in plain language. On data, the vendor commits in writing not to train on your content without consent, gives a clear retention and deletion period, and either names sub-processors or commits to notice before adding new ones. On IP, the contract confirms you own inputs and outputs, with an indemnity covering third-party claims and a cap that bears some relationship to what is at stake.

If you are paying enterprise-tier prices, you should expect enterprise-tier language. The major vendors have written it. The question is whether the version of the contract in front of you is the one with that language in it. Salespeople sometimes present standard commercial terms to small customers who would qualify for the enterprise version if they asked. It is worth asking. The Information Commissioner’s Office 2024 guidance sets out the controller-versus-processor distinction, audit rights, accuracy KPIs and sub-processor controls in detail, and is a useful checklist when you are uncertain.

When to escalate to a commercial solicitor

Three situations call for a solicitor before you sign. First, when the data use rights are unusually broad and the vendor will not narrow them, particularly where the contract permits use for any “service improvement” purpose without a carve-out. Second, when the IP indemnity is absent, capped below the engagement value, or excludes outputs used commercially. An indemnity that only covers outputs you never modify protects nothing.

Third, when the contract uses terms you genuinely cannot parse, even with the six questions above in front of you. “Derivative works”, “moral rights”, “background IP”, “foreground IP”, “sub-licensable on a royalty-free basis”, all have specific legal meanings that can change the deal materially. If you have read the section twice and still are not sure what it does, that is the signal. It costs less to get a solicitor to read a clause now than to argue about it in eighteen months.

The reading discipline in this post gives you a defensible position on the contracts you are likely to see. Book a conversation if you want a peer view on whether the contract in front of you is one to sign as drafted or one to take to a solicitor first.

Sources

- OpenAI (2025). Service Terms and Row Terms of Use. Sets out the bifurcated consumer-vs-enterprise approach to input training rights and the 30-day API retention default. https://openai.com/policies/service-terms/ - Anthropic (2024). Commercial Terms of Service. Covers the commitment not to train on customer content from services, plus the sampling carve-out for learnings used to improve model architectures. https://www.anthropic.com/news/expanded-legal-protections-api-improvements - Google Cloud (2025). Service-specific terms for Vertex AI and Gemini. Sets out the prior-permission rule for training and the model-improvement sampling exception. https://cloud.google.com/terms/service-terms - Adobe (2025). Firefly product description and IP indemnification scope. Demonstrates feature-level granularity in IP indemnity, including the carve-out for non-Adobe trained models. https://helpx.adobe.com/legal/product-descriptions/adobe-firefly.html - Information Commissioner's Office (2024). AI procurement, contracts and third parties guidance. The UK regulatory baseline for what a contract with an AI vendor should contain on controller-processor roles, audit rights and accuracy KPIs. https://ico.org.uk/for-organisations/advice-and-services/audits/data-protection-audit-framework/toolkits/artificial-intelligence/contracts-and-third-parties/ - EU Artificial Intelligence Act (2024). Article 53 obligations on general-purpose AI model providers, including copyright compliance policies and the public training-content summary. https://artificialintelligenceact.eu/article/53/ - GDPR (2018). Article 28 processor obligations and the contractual minimum a processor agreement must contain, including sub-processor authorisation. https://gdpr-info.eu/art-28-gdpr/ - Latham and Watkins (2025). Analysis of Getty Images v Stability AI, High Court of England and Wales judgment, 4 November 2025. Confirms that secondary infringement claims on model weights failed, narrowing but not closing customer-side risk on outputs. https://www.lw.com/en/insights/getty-images-v-stability-ai-english-high-court-rejects-secondary-copyright-claim - Linklaters (2025). US Copyright Office guidance on copyrightability of AI-generated materials, confirmed February 2025. Explains the human-authorship threshold that determines whether an AI output is copyrightable. https://www.linklaters.com/knowledge/articles/alerts-newsletters-and-guides/2025/february/21/copyrightability-of-ai-generated-materials-and-us-copyright-law - Morgan Lewis (2025). Key concepts in AI contracting, data rights and restrictions. Practitioner analysis of how AI contracts narrow the definition of customer data relative to SaaS norms. https://www.morganlewis.com/blogs/sourcingatmorganlewis/2025/12/key-concepts-in-ai-contracting-data-rights-and-restrictions

Frequently asked questions

Does my AI vendor have the right to train its model on the data I put in?

It depends entirely on which tier of contract you are on. Standard commercial and consumer terms from major vendors often retain broad rights to use inputs for service improvement, which can include training. Enterprise tiers from OpenAI, Anthropic and Google explicitly commit not to train on customer content without consent. If your contract is silent or vague on this point, assume the vendor's default policy applies, and the default is usually permissive.

Who owns the output when I prompt the AI to generate something for me?

Major vendors typically assign output ownership to the customer in their commercial terms. Ownership and copyrightability are two different things, though. The US Copyright Office confirmed in February 2025 that purely AI-generated work is not eligible for copyright unless a human has made meaningful creative decisions in prompting, editing or arranging it. You own the output as a matter of contract. Whether you can stop a competitor copying it is a separate question.

When should I take an AI contract to a solicitor?

Three triggers. When the data use rights are unusually broad and you cannot narrow them in negotiation. When the IP indemnity is absent, capped below the value of the engagement, or excludes outputs in commercial use. When the contract uses terms you cannot parse with the six questions in this post in front of you. Material contracts in any of those three states need a commercial solicitor before signature.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation