How to classify your data for safe AI tool use

A member of staff at a small management consultancy has a client engagement letter open on one screen and an AI writing assistant on another. The engagement letter has the client’s company name, the project scope, and a few fee references. The question in their head is whether they can paste this in to draft the summary.

Whether they answer correctly depends entirely on what guidance your firm has given them. What gives them a usable answer in that moment is a classification label, simple enough to check in five seconds.

What does data classification mean when your team is using AI tools?

Data classification for AI tools is a labelling system that tells staff which category their content falls into and what they can do with it. It answers one question, can I put this into the tool I am about to use? For a typical UK services firm, four categories cover the full range of everyday situations: Public, Internal, Confidential, and Restricted. The label replaces a judgement call with a decision rule.

The ICO’s AI and data protection guidance frames classification as a control for minimisation. You should process only the data features relevant to the purpose. That principle maps directly to the prompt. If a task still works without the client’s name, invoice number, and address, those fields have no business being in the prompt.

The practical sequence for a small firm runs in this order: identify what data you have, assign a class to it, decide which class may go into which tool, strip identifiers where possible, test on low-risk use cases first, and review outputs before sharing. This order matters. Choosing tools before classifying your data is where firms commonly run into trouble, because a data handling problem surfaces after staff have been using a tool for months, not before.

A Public label means the content is already intended for public release, so any tool is acceptable. Internal, Confidential, and Restricted each carry progressively tighter restrictions on what the content can touch. The tool-mapping decision flows from the label, not from a case-by-case judgement.

Why does it matter before your team starts using AI?

The ICO is explicit that there is no AI exemption under UK data protection law. If a prompt contains personal data, UK GDPR and the Data Protection Act 2018 apply to it, regardless of which tool is processing it. Classification gives that legal reality a practical form. Staff know, before they paste, whether the content they are holding can go into the tool in front of them.

This matters most acutely for two categories that services firms use routinely. First, personal data, which covers client names, contact details, employee records, and call notes with identifiers. Second, special category data under Article 9 of UK GDPR, which includes health information, racial or ethnic origin, trade union membership, religious belief, and similar categories. Article 9 requires a specific lawful basis for processing, beyond the general Article 6 basis, and for many everyday AI use cases that basis simply does not exist.

Data Protection People’s guidance on AI and UK data protection highlights shadow AI and data leakage as the two primary operational risks for small firms. Shadow AI describes staff using unapproved tools without the firm knowing. Data leakage describes content ending up with a supplier who may use it to improve their model. Classification addresses both risks. A staff member who knows a document is Restricted does not need to call a senior partner for guidance. The label has already answered the question.

One point worth making explicit. An approved tool should mean approved for a specific class of data, not just approved in general. A tool that works well for drafting public marketing copy may not be appropriate for HR correspondence or client files. Approvals need to be granular.

Where in your working day will this actually matter?

Staff in a small services firm face this type of decision repeatedly. Drafting an HR letter and reaching for a writing assistant, pulling key terms from a client contract, summarising a recorded call. Each of those moments becomes a quick reference rather than a judgement call, does my label say I can use this here?

The specific risk profiles differ considerably. An HR letter containing a staff member’s name and payroll details sits in the Restricted or Confidential band depending on its content. A client call summary with financial figures is Confidential. A marketing brief for a new campaign is Public and carries no restriction.

Leakage in services firms typically comes from well-intentioned short cuts, the prompt that was almost generic with just one name or one number left in. Data Protection People’s guidance on prompt discipline makes the point directly. If the task still works after names, email addresses, invoice numbers, and contract references are removed, those fields should not be in the prompt.

The ICO’s guidance on security and data minimisation in AI makes clear that AI systems can make security risks harder to manage. Once content is in a third-party tool, the firm has substantially less control over what happens to it. That is precisely why the decision needs to be made before the paste, not after.

When do you need a formal classification scheme and when can you keep it simpler?

Not every firm needs four tiers from day one. If your work is mainly public-facing content, a two-rule approach works well, either safe to use freely or do not use without approval. The four-tier model earns its place when the firm handles client data under confidentiality obligations, employs staff whose HR records sit in your systems, or works with any special category data under UK GDPR.

For a firm that has no personal data passing through AI tools and no confidential client material, a lighter scheme is a proportionate response. The ICO’s data minimisation principle supports this. The right control is the one that fits the actual risk, not a more elaborate framework than the risk warrants.

The harder question is knowing when you have crossed from “mainly public” into “sometimes sensitive.” The tell is almost always a people element. The moment your AI use involves staff records, client names, or contract terms provided under confidence, you need at minimum a Confidential classification and a clear rule about which tools it can go into.

For very small teams, the two-tier fallback is often more practical than a four-tier model they will struggle to maintain. A single written rule captures a substantial share of the exposure. Anything about a named person, or provided by a client under confidence, goes in the restricted pile, and the restricted pile does not go into general-purpose AI tools. That one rule, applied consistently, substantially reduces the day-to-day exposure.

What else supports a data classification scheme for AI?

Classification is the foundation, but it only holds if the things built on top of it are in place. Three habits make a scheme stick in a small firm: removing personal data from prompts unless the use case is explicitly approved, defaulting to synthetic or anonymised data when testing new tools, and keeping a short approved-tools list that maps each data tier to what staff can use.

On prompt habits, Data Protection People’s guidance recommends keeping prompts generic where possible and removing personal data and confidential detail unless the tool and the use case have been explicitly approved. In practice, this means treating a prompt as a template. Anything that identifies a specific person, client, or contract should be replaced with a placeholder or removed entirely, unless there is a documented reason to include it.

On synthetic data, if you want to pilot a new AI tool safely, synthetic data is a strong option. MOSTLY AI positions synthetic data as privacy-safe and structurally representative of production data, suitable for sharing, model training, and simulation. For a small services firm, testing with synthetic data means the tool gets a genuine run-through without any real personal information in play.

On embedding classification in the firm’s workflow, the UK Government’s AI skills tools package emphasises that responsible AI adoption should be accessible to non-technical staff. That means the classification scheme needs to sit in document templates, in the prompt library, and in the onboarding run-through. Embedding a classification label in the document template, or adding a one-line guide to the prompt library, is what changes behaviour at the moment it matters.

Simple data classes for safe use of AI tools

Key takeaways

What does data classification mean when your team is using AI tools?

Why does it matter before your team starts using AI?

Where in your working day will this actually matter?

When do you need a formal classification scheme and when can you keep it simpler?

What else supports a data classification scheme for AI?

Sources

Frequently asked questions

What data should never go into a general-purpose AI tool?

What does "approved tool" mean in a data classification scheme?

How do you test an AI tool safely before using it with real data?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Simple data classes for safe use of AI tools

Key takeaways

What does data classification mean when your team is using AI tools?

Why does it matter before your team starts using AI?

Where in your working day will this actually matter?

When do you need a formal classification scheme and when can you keep it simpler?

What else supports a data classification scheme for AI?

Sources

Frequently asked questions

What data should never go into a general-purpose AI tool?

What does "approved tool" mean in a data classification scheme?

How do you test an AI tool safely before using it with real data?

Ready to talk it through?

Related reading

Write an AI acceptable-use policy your team will actually follow

Who owns the AI in your agency, and what do you tell the client?

What your board actually wants when it asks about AI

If any of this sounds familiar, let's talk.