Data classification in financial services firms

Picture a small financial planning firm with eight advisers, a compliance officer who wears several hats, and fifteen years of client data spread across a CRM, a shared drive, email, and three SaaS tools adopted over the past few years. Someone suggests using an AI assistant to summarise client meeting notes. The question nobody can answer confidently is which of that data is safe to feed into the tool, and which would create a regulatory problem if it ended up somewhere it should not.

That question is exactly what data classification is built to answer.

What is data classification?

Data classification is the process of sorting your data into defined sensitivity tiers and applying specific handling rules to each. In practice, it means labelling information as public, internal, confidential, or restricted, and writing clear rules about who can access each level, where it can be stored, and how it must be transmitted. The classification scheme is the organising map all other security and compliance decisions hang from.

For an owner-managed firm, three or four levels is typically enough. UK-focused guidance from the NCSC and sector-specific IT advisors consistently recommends a simple scheme: public, internal, confidential, and restricted or highly confidential. Each level carries its own handling rules covering access, storage location, encryption requirements, and retention periods.

The classification does not stop at the label itself. Attaching a “Confidential” tag to a file folder does nothing useful if there are no linked controls behind it. Classification becomes operational when the label determines what actually happens next, meaning which tools can process the data, who can share it externally, and what response is triggered if it is lost or accessed without authorisation. A scheme that lives only in a policy document rather than in the configuration of your systems is a compliance risk waiting to surface.

Why does it matter more in financial services?

Financial services firms hold data that carries a disproportionate cost when it goes wrong. Customer PII, payment details, KYC documents, trading records, and regulated communications all sit under overlapping regimes. UK GDPR via the ICO, FCA Principles and SYSC rules, PRA oversight, and NCSC cybersecurity expectations all apply. IBM’s 2023 Cost of a Data Breach Report found financial services had the second-highest average breach cost globally at USD $5.90m, roughly £4.7m.

The regulatory enforcement record reinforces the point. Tesco Bank was fined £16.4m by the FCA following a cyber attack that exploited weaknesses in customer debit card controls. British Airways was fined £20m by the ICO for failing to protect login, payment card, and booking data. Marriott International was fined £18.4m for not properly understanding or securing data in an acquired reservation system. In each case the regulator’s finding was not only that the firm had suffered a breach. The finding was that controls were not proportionate to the sensitivity of the data the firm held.

The FCA’s Dear CEO letters on operational resilience and cyber risk have cited weak information classification as a gap on multiple occasions since 2021. The ICO’s accountability framework requires organisations to demonstrate controls appropriate to the risk of their data. Classification is the mechanism that makes “proportionate controls” a defensible claim rather than an aspiration.

Where will you actually meet it in a regulated firm?

You will meet data classification requirements in three places. Your regulator’s expectations, your suppliers’ contractual requirements, and the operational decisions you make every day about where to store and share information all point to the same need. The FCA and Bank of England both operate formal four-tier classification schemes for their own data, and third parties handling that data are expected to mirror the same discipline.

The Bank of England’s third-party standard uses four levels: Public, Official, Official-Sensitive, and Secret. Each level carries specific rules on access control, storage, encrypted transmission, and incident reporting requirements. The FCA runs a parallel scheme with four categories of its own, covering different transmission and disposal requirements at each level. These are not aspirational frameworks. Suppliers and technology partners selling into UK financial institutions are increasingly asked to demonstrate that their products handle data in line with those classification and handling standards.

For an owner-managed financial services firm, the practical encounter with classification arrives in three moments: when you adopt a new SaaS tool and need to decide what data goes into it, when a regulatory review or client due-diligence exercise asks about your data risk controls, and when an AI tool enters the picture. Each moment requires a working answer to the same question, namely which of our data is sensitive, where does it currently live, and who can reach it?

When is a formal scheme worth the effort, and when is it overkill?

The honest answer depends on how much data you hold and how many systems it sits across. An owner-managed firm running its entire operation through a single FCA-approved platform and keeping minimal local storage will gain more from tightening that platform’s own configuration than from building a four-level written scheme. The ICO’s accountability framework is explicit that controls should be appropriate to the risk, not maximal by default.

The calculus shifts as complexity grows. A ten-person advisory firm with client data across a CRM, file storage, email, and several cloud-based tools has genuine classification exposure. IBM’s research found that organisations with extensive use of data classification and data discovery tools had breach costs on average USD $1.03m lower than those without, a reduction of roughly 20% against the 2023 global average. The cost of getting it wrong is not theoretical.

Over-classification is a real failure mode too. If everything is labelled “Restricted”, staff route around the labels and security weakens in practice. Ponemon Institute’s 2023 Global Data Risk Report found that 76% of organisations have more than one million files accessible to every employee, often because access was never restricted in the first place. Three or four levels, applied consistently and enforced through system configuration and access controls, is substantially more effective than six levels on paper with no link to how the systems actually behave.

How does data classification connect to AI adoption and operational resilience?

Data classification is the common thread running through three things that sit at the top of the UK regulatory agenda for financial services firms, namely operational resilience planning, AI governance, and access control. Once you know which of your data is sensitive, you can make defensible decisions about where AI tools can reach, which services are critical to the business, and who inside the firm genuinely needs access to what.

On AI specifically, the NCSC’s guidance on using public generative AI safely recommends that organisations classify data and prohibit feeding sensitive financial and customer data into public AI tools. The ICO’s 2023 generative AI guidance adds that prompts and outputs involving identifiable individuals should be treated as personal data under UK GDPR. Classification gives you the framework to act on both recommendations. If client account records are classified as Confidential, the handling rule can state plainly that they must not be entered into unmanaged AI tools.

On operational resilience, the Bank of England and FCA’s joint policy requires firms to identify their important business services and map the data assets supporting them. That mapping is classification work, whether or not it is called that. On access control, classification is the prerequisite for tightening over-permissive permissions. You cannot apply role-based access controls sensibly until you know what data is in each system and how sensitive it is.

A proportionate starting point for a firm without a scheme is to define three or four levels and write the handling rules for each, take stock of your key systems and identify the highest-sensitivity data present in each one, and link the classification to at least one concrete control per level, whether that is multi-factor authentication, encryption in transit and at rest, or a written rule about which tools can process that tier of data. Reviewed annually, that is a defensible foundation that satisfies the regulator’s proportionality test and gives you a clear answer the next time someone asks about feeding client data into an AI tool.

Data classification in regulated financial services firms

Key takeaways

What is data classification?

Why does it matter more in financial services?

Where will you actually meet it in a regulated firm?

When is a formal scheme worth the effort, and when is it overkill?

How does data classification connect to AI adoption and operational resilience?

Sources

Frequently asked questions

What data classification levels should a small financial services firm use?

Does UK GDPR require data classification?

Can I use AI tools with client data if I have a classification scheme in place?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Data classification in regulated financial services firms

Key takeaways

What is data classification?

Why does it matter more in financial services?

Where will you actually meet it in a regulated firm?

When is a formal scheme worth the effort, and when is it overkill?

How does data classification connect to AI adoption and operational resilience?

Sources

Frequently asked questions

What data classification levels should a small financial services firm use?

Does UK GDPR require data classification?

Can I use AI tools with client data if I have a classification scheme in place?

Ready to talk it through?

Related reading

Find the shadow AI in your agency before a client's data leaks through it

A four-tier data map so your team knows what AI can touch

Capture the shop-floor knowledge before it retires

If any of this sounds familiar, let's talk.