AI, GDPR and the EU AI Act: data retention rules for SMEs

A practice manager at a small firm reviewing a printed data retention policy at her desk with a laptop open beside her
TL;DR

Small firms using AI on customer or employee data now sit between two retention clocks. GDPR Article 5(1)(e) requires personal data to be deleted once its purpose is fulfilled. The EU AI Act requires technical documentation for high-risk systems to be kept for ten years. The way through is to separate raw personal data, anonymised data, and system documentation at design time, with named owners and scheduled deletion.

Key takeaways

- GDPR storage limitation and the EU AI Act ten-year documentation rule apply to different categories of data, not the same data. - Even firms that only deploy an AI tool inherit traceability and audit expectations through vendor contracts. - Treat raw personal data, pseudonymised data, anonymised data, and system logs as four separate retention questions. - Genuinely anonymised data sits outside GDPR and can be retained for model auditing and retraining; pseudonymised data does not. - A workable retention policy names an owner per dataset, sets a retention period per use case, and schedules deletion or anonymisation rather than relying on memory.

The practice manager at a twelve-person professional services firm has spent six months quietly fine-tuning an AI assistant on five years of client emails. It drafts responses well. The partners are pleased. Last week a new enterprise client sent a vendor questionnaire asking for the firm’s AI data retention policy, the lawful basis for training on customer data, and the deletion schedule for any model artefacts. There is no policy. There is no deletion schedule. There is a Friday afternoon spent reading the EU AI Act on her phone in the kitchen.

That moment is now common. The tools have moved faster than the policies behind them, and the questionnaires from larger customers have caught up. The good news is that the rules, while real, are tractable for a small firm if you separate the questions properly at the start.

What is the actual rule on AI and data retention?

There is no single AI data retention rule. There are two regimes pulling in opposite directions on different categories of data, and the practical answer for any small firm sits in the gap between them. Both apply at once, and neither cancels the other out.

GDPR Article 5(1)(e), the storage limitation principle, requires personal data to be kept no longer than necessary for the purpose it was collected. The ICO restates this in plain language for small organisations and asks you to set a retention period, justify it, and securely delete or anonymise data at the end of it. The EU AI Act, in force and applying in phases through 2026, requires providers of high-risk AI systems to retain technical documentation, quality management records, and certain logs for at least ten years after the system is placed on the market.

Specialist commentary describes these as two regulatory clocks running against each other, and the resolution is hierarchical. Deletion of raw personal data must happen first, leaving behind only non-personal documentation and anonymised records to satisfy the AI Act’s longer archival duty. Both clocks can be honoured if the data is sorted into the right buckets at the start, which is the part that gets skipped when a firm builds an AI tool first and writes the policy second.

Why does it matter for your business?

The gap between the two regimes is where reputational and contractual risk sits, even for firms not in scope of the AI Act’s high-risk provisions. Enterprise buyers are already asking for AI data retention policies as standard procurement questions. Insurers are starting to ask the same. The ICO can act on GDPR breaches regardless of whether you have heard of the AI Act, and storage limitation has been a live enforcement area for years.

If you cannot answer a vendor questionnaire about how long you keep training data, what happens when a client asks to be forgotten, and who owns each dataset that feeds your AI tools, you will lose deals before you lose fines. That is the more immediate cost. The deals lost this way are usually the larger, slower-closing ones, the enterprise contracts that take three months of procurement and a clean answer to the data section to convert. A firm without a policy looks indistinguishable from a firm with bad practice.

Where will you actually meet it?

You meet it in three places, none of which announce themselves as AI data retention questions. Each surfaces in a different week, from a different person, and each needs the same underlying policy to answer cleanly. If the policy is not written down somewhere a colleague can find it without asking you, the answer will land late or land wrong.

The first place is the data processing addendum on any AI tool that handles client information, where the small print says the vendor may retain logs and metadata for stated periods and may use anonymised inputs for product improvement. The second is procurement, where a new enterprise customer sends a security questionnaire that includes specific questions on AI use, training data, and deletion schedules. The third is internal, when a former employee asks for their data to be removed and you realise their emails are already woven into the model that drafts the firm’s client correspondence. The third one is the hardest, because the cost of unpicking it after the fact is usually a full retrain.

When to ask versus when to ignore

There are categories of AI use where the retention question is small enough to handle with common sense, and others where it deserves serious attention. The test is whether the data leaves its original system, whether identifiers travel with it, and whether the model or the vendor retains anything derived from it. If any of those three is yes, treat retention as load-bearing rather than admin.

If your team is using a hosted AI tool for drafting individual emails or summarising a meeting transcript that nobody is storing centrally, the retention question is essentially the vendor’s data processing addendum and a sensible internal rule about what you paste into prompts. If you are fine-tuning a model on customer data, building a retrieval system over client files, or using AI on health information, financial records, or anything that would count as special category data under GDPR, the question changes shape. It becomes a design decision, not a policy footnote, and the cost of getting it wrong is usually borne by the next deal cycle or the next subject access request, whichever arrives first.

How does a small firm actually run this?

The practical answer is to treat retention as four separate questions rather than one. The four categories each have different rules, different retention clocks, and different audit consequences, which is why bundling them under a single policy line tends to produce a document that nobody can actually apply. Sort the data first, then write the rule for each bucket.

Raw personal data, the original emails, records, or transcripts, gets the shortest retention period, set per use case and justified by the lawful basis you relied on to collect it. Pseudonymised data, where identifiers have been replaced with tokens but the mapping still exists, stays under GDPR and inherits the same deletion clock. Anonymised data, where re-identification is genuinely not feasible, sits outside GDPR and can be kept for model auditing or retraining. Peer-reviewed work in digital health has shown how synthetic datasets that replicate statistical properties without identifiers can support AI training while staying compliant. System logs and model documentation, the AI Act’s ten-year material, are non-personal by design and can be retained on a longer cycle.

Wrap those four categories in a short policy that names an owner per dataset, sets a retention period in months, schedules the deletion or anonymisation action, and logs that the action happened. Add a flow-down clause to your vendor contracts that mirrors your own policy. Review quarterly. The discipline is not glamorous. It is also the thing that turns a Friday-night vendor questionnaire into a fifteen-minute answer with a document attached.

If the practice manager at the start had had that document six months ago, she would still have built the assistant. She would have done it with a defined retention period on the source emails, an anonymised retraining set kept for audit, a deletion schedule logged in writing, and a one-paragraph answer ready for any client who asked. The work is the same. The exposure is different.

Sources

- ICO, storage limitation guidance for small organisations, https://ico.org.uk/for-organisations/advice-for-small-organisations/information-security/data-storage-advice/ - European Commission, regulatory framework for AI (EU AI Act overview and application dates), https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai - TechGDPR, reconciling the regulatory clock between GDPR and the AI Act, https://techgdpr.com/blog/reconciling-the-regulatory-clock/ - EPIC, data minimisation across GDPR, CCPA, and the Maryland Online Data Privacy Act, https://epic.org/issues/consumer-privacy/data-minimization/ - Concentric AI, technical guide to data retention policies, https://concentric.ai/a-technical-guide-to-data-retention/ - Frontiers in Digital Health, synthetic data for healthcare AI training under GDPR and HIPAA, https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2025.1563991/full - JD Supra, healthcare AI deployment and HIPAA contractual duties, https://www.jdsupra.com/legalnews/healthcare-ai-deployment-compliance-1263585/ - ISACA, the AI audit trail from policy to proof, https://www.isaca.org/resources/news-and-trends/newsletters/atisaca/2026/volume-9/the-ai-audit-trail-from-ai-policy-to-ai-proof - Intuitive Operations, AI regulations and their impact on SMEs, https://intuitive-operations.com/2025/12/15/ai-regulations-smes-2025-impact/

Frequently asked questions

We only use ChatGPT and a couple of vendor AI tools. Does the EU AI Act ten-year rule apply to us?

Probably not directly. The ten-year technical documentation duty falls on providers of high-risk systems, not on the small firm using them. The catch is that vendor contracts often push some of the logging, audit, and retention expectations down to you as the deployer. Read the data processing addendum on any AI tool that handles client information, and treat the ICO storage limitation guidance as your baseline regardless.

Can we keep training data forever if we anonymise it?

If the anonymisation is genuinely irreversible, the data falls outside GDPR and can be retained for longer, including for model auditing and retraining. The hard part is the word genuinely. Pseudonymised data, where identifiers are replaced with tokens but the mapping still exists somewhere, remains personal data under GDPR. Most firms underestimate how easily so-called anonymised datasets can be re-identified when combined with other sources.

What does a sensible retention policy look like for a ten-person firm?

A one-page document is usually enough. List each dataset that touches an AI tool, name the owner, state the lawful basis, set the retention period in months, and say what happens at the end of that period, either secure deletion or irreversible anonymisation. Add a line on vendor flow-down obligations. Review the document quarterly and log the review. Most of the value is in the discipline of writing it down, not the length.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation