Data and IP clauses in AI contracts, in plain English

An owner sat at her kitchen table last week with a vendor contract printed out and a pen in her hand. She had read the data-rights section three times. She vaguely sensed it gave the vendor more latitude than she was comfortable with, but could not point to the line that made her uneasy. The IP section talked about derivative works and she was not sure she had followed it.

That is the right instinct. Almost everything in an AI vendor contract is familiar SaaS language. The parts that are genuinely different are narrow, and they sit in two places. Once you know what those two places are and what to look for inside each one, the contract stops feeling impenetrable and starts feeling like a document with six questions you can answer or six questions you cannot.

This is the reading discipline, not a legal opinion. The cluster sibling posts who owns the work when AI wrote it and copyright and training data, what business owners need to know cover the underlying law. This piece is the layer on top, how that law shows up in the contract you are being asked to sign.

Why these two clause categories matter more than the rest

AI contracts differ from traditional SaaS in two places, data use rights and intellectual property ownership. Payment terms, service levels, termination notice, governing law, indemnity for security breach, all look much like the SaaS contracts you have signed before. The two new places are where vendors have written language to handle what SaaS never had to address, what the vendor can do with what you feed in, and who owns what comes out.

The commercial reason is straightforward. With traditional SaaS, your data sat in the vendor’s database. With AI, your data potentially trains the model. A vendor that quietly trained future versions on your client correspondence or pricing rationale would be building a service that competes with you, paid for by your inputs. The data-rights clause is where the contract either closes that door or leaves it open. The IP clause is the equal-and-opposite question on what the AI produces.

What three questions should the data section answer?

The data-rights section should give you a clear answer to three questions. First, can the vendor use what you put in to train or improve its models. Second, how long does the vendor retain your data, and what happens to it when you stop paying. Third, who else can access your data through sub-processors, affiliates or service providers the vendor itself uses.

On training rights, the landscape has split. Enterprise tiers from OpenAI, Anthropic and Google now explicitly commit not to train on customer content without consent. Standard commercial and consumer terms from the same vendors often retain broader rights, sometimes phrased as use for “service improvement”, which is wider than training alone. If your contract is silent or hedged, the vendor’s default policy applies, and the default is usually permissive. Anthropic and Google both reserve the right to extract “learnings” from your data even when they commit not to retain the data itself.

On retention, the OpenAI litigation with the New York Times in 2025 demonstrated something material. A court-ordered preservation hold can override the deletion clause in your contract. Many vendors now write that possibility into their terms explicitly. On sub-processors, GDPR Article 28 requires the vendor to list who else processes your data and to give you notice before that list changes. Narrow the vendor’s discretion in negotiation if the data you are putting in is sensitive.

What three questions should the IP section answer?

The IP section should answer three questions with the same clarity. Who owns the inputs you provide. Who owns the outputs the AI produces. What indemnity does the vendor offer if a third party claims one of those outputs infringes their rights. Many owners read past the first two and stop at the third without realising the first two set up what the third covers.

Input ownership is usually retained by the customer, but ownership does not prevent vendor use. OpenAI’s standard terms let the vendor use inputs for training and improvement even though you technically own them. Anthropic and Google extract learnings without retaining inputs, which sits in a contractual grey zone. Read the input clause alongside the data-rights clause, the two together tell the actual story.

Output ownership varies more than owners expect. OpenAI and Anthropic assign output ownership to the customer in their commercial terms. Some enterprise SaaS vendors only grant a license to use outputs. A handful of contracts are simply silent. Where the contract is silent, you do not own the output by default. The indemnity question is where contracts get cute. Vendors narrow indemnity in three ways, by limiting it to outputs you have not modified, by capping liability at twelve months of fees paid, or by carving out features built on third-party models. Adobe’s Firefly indemnity is the cleanest illustration, feature-specific and excluding non-Adobe trained components. Read what is excluded, not just what is included.

What does a healthy contract look like in practice?

A healthy AI contract answers all six questions in plain language. On data, the vendor commits in writing not to train on your content without consent, gives a clear retention and deletion period, and either names sub-processors or commits to notice before adding new ones. On IP, the contract confirms you own inputs and outputs, with an indemnity covering third-party claims and a cap that bears some relationship to what is at stake.

If you are paying enterprise-tier prices, you should expect enterprise-tier language. The major vendors have written it. The question is whether the version of the contract in front of you is the one with that language in it. Salespeople sometimes present standard commercial terms to small customers who would qualify for the enterprise version if they asked. It is worth asking. The Information Commissioner’s Office 2024 guidance sets out the controller-versus-processor distinction, audit rights, accuracy KPIs and sub-processor controls in detail, and is a useful checklist when you are uncertain.

When to escalate to a commercial solicitor

Three situations call for a solicitor before you sign. First, when the data use rights are unusually broad and the vendor will not narrow them, particularly where the contract permits use for any “service improvement” purpose without a carve-out. Second, when the IP indemnity is absent, capped below the engagement value, or excludes outputs used commercially. An indemnity that only covers outputs you never modify protects nothing.

Third, when the contract uses terms you genuinely cannot parse, even with the six questions above in front of you. “Derivative works”, “moral rights”, “background IP”, “foreground IP”, “sub-licensable on a royalty-free basis”, all have specific legal meanings that can change the deal materially. If you have read the section twice and still are not sure what it does, that is the signal. It costs less to get a solicitor to read a clause now than to argue about it in eighteen months.

The reading discipline in this post gives you a defensible position on the contracts you are likely to see. Book a conversation if you want a peer view on whether the contract in front of you is one to sign as drafted or one to take to a solicitor first.

Data and IP clauses in AI contracts, in plain English

Key takeaways

Why these two clause categories matter more than the rest

What three questions should the data section answer?

What three questions should the IP section answer?

What does a healthy contract look like in practice?

When to escalate to a commercial solicitor

Sources

Frequently asked questions

Does my AI vendor have the right to train its model on the data I put in?

Who owns the output when I prompt the AI to generate something for me?

When should I take an AI contract to a solicitor?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Data and IP clauses in AI contracts, in plain English

Key takeaways

Why these two clause categories matter more than the rest

What three questions should the data section answer?

What three questions should the IP section answer?

What does a healthy contract look like in practice?

When to escalate to a commercial solicitor

Sources

Frequently asked questions

Does my AI vendor have the right to train its model on the data I put in?

Who owns the output when I prompt the AI to generate something for me?

When should I take an AI contract to a solicitor?

Ready to talk it through?

Related reading

Switching AI vendors without burning everything down

When an AI engagement goes wrong, the escalation playbook

Exit clauses and switching costs, planning departure on day one

If any of this sounds familiar, let's talk.