How to prepare your business data for AI tools

A person reviewing printed documents at a desk with a laptop open nearby
TL;DR

Getting your data AI-ready means working through four steps in sequence: inventorying what you hold, cleaning and securing it, documenting your governance, and connecting to AI tools carefully. For a UK owner-managed services firm, that work addresses legal requirements under UK GDPR and ICO guidance, reduces security risk, and means your AI tools give you reliable outputs rather than compounding existing data problems.

Key takeaways

- Getting data ready for AI follows four steps: inventory what you hold, clean and secure it, document your governance, then connect to AI tools carefully. - UK GDPR and ICO guidance applies to any AI use involving personal data; a Data Protection Impact Assessment is required for high-risk uses such as profiling and automated decisions. - The NCSC warns that data submitted to online AI tools may be stored or used to improve those services; enterprise AI plans with formal data controls are meaningfully different from consumer tools. - A risk-based approach applies: lighter governance is appropriate for low-stakes, non-personal-data uses; full documentation is required for profiling, automated decisions, and large-scale processing. - Data readiness is an ongoing operational discipline rather than a one-off tidy-up; ISO/IEC 8183 and UK government guidance both frame it as a continuous lifecycle requiring clear data ownership.

A colleague mentions at a networking event that their team has started using Copilot to draft client reports. They are saving two hours a week. Then their IT support provider asks whether the SharePoint permissions have been reviewed lately. Silence. The data the AI was drawing on was accessible to the whole firm, including a contractor who had left eight months earlier.

The scenario is more common than you might expect. Many owner-managed services firms start using AI tools in a state of inherited disorder: records inconsistently entered, ownership unclear, access controls set up once and never revisited. The tools work well on clean, well-governed data. On messy, over-accessible data they produce unreliable outputs and create new risks. Getting your data ready is the foundation that determines whether your AI investment pays off.

What does “data ready for AI” actually mean?

“AI-ready data” is business data that has been inventoried, cleaned, secured, and documented well enough to use with AI tools without exposing your firm to legal, security, or quality risk. The UK government’s AI-ready datasets guidance breaks this into four pillars: technical quality, documentation, organisational infrastructure, and legal and ethical compliance. For a small services firm, those pillars map to a four-step sequence of practical work.

The UK government’s framework points to ISO/IEC 8183, the international standard for AI data lifecycles, as a reference. The core idea there is that data readiness is a continuous discipline rather than a project you complete once. A clean-up before your first AI pilot will help initially. Without clear ownership and regular maintenance, data quality degrades and the AI tools depending on it become progressively less reliable.

Why does this matter for a small services business?

The ICO’s guidance on AI and data protection is clear: any organisation using AI to process personal data needs a lawful basis, must consider a Data Protection Impact Assessment for high-risk uses, and must be able to explain how the AI uses people’s data. That applies to a ten-person consultancy as much as to a bank. Failure to prepare creates real legal exposure.

The numbers give you a sense of the stakes. The ICO fined British Airways £20m in 2020 following a data breach caused by security failings. In 2019, Bounty (UK) Limited was fined £400,000 for sharing the personal data of 14 million people with third parties without adequate transparency, establishing that secondary uses of customer data, including for analytics, require clear notice and a lawful basis.

The FCA’s 2023 survey of 73 UK financial firms found that 72% reported using or developing machine-learning applications, but many cited data quality and availability as key barriers to deploying them safely. That gap between intention and safe deployment is what data preparation is designed to close.

The NCSC adds a practical warning: data submitted to online AI tools may be stored or used to improve those services. The NCSC recommends reviewing provider policies before sharing any sensitive business data with an AI-as-a-service tool.

Where do the problems show up in practice?

Data problems surface in predictable places once you start using AI tools. The commonest is inconsistent, duplicated, or missing records: client names entered differently across systems, dates in mixed formats, older files with no clear owner. A second pattern is access-control drift, where data is readable by anyone in the firm even when only two people need it. Both problems limit what AI can do safely.

The first pass is an inventory. Build a simple spreadsheet listing each system, its data owner, the data types it holds, whether any is personal or special-category data (health, ethnicity, financial), and the business use. Government guidance recommends understanding where data comes from and how it flows before using it for AI. For a services firm, that exercise typically takes a few hours rather than a few weeks.

The second pass is cleaning and securing. Standardise formats, remove obvious duplicates, and apply role-based access control. The NCSC recommends strong authentication and role-based access for any data used by AI services. Government guidance describes encrypting data at rest and in transit as “non-negotiable” for AI-ready data. Both are standard security practices; AI makes them more pressing because AI tools synthesise information across everything they can reach.

The third pass is documentation and governance. Write a short internal note per use-case: data sources, the legal basis for using personal data, retention period, and who can access it. If a use-case involves profiling clients or automating decisions about individuals, a DPIA is required before you proceed.

The fourth pass is connecting to AI tools carefully. Enterprise plans for tools such as Microsoft Copilot for Microsoft 365 inherit your existing permissions structure and, per Microsoft’s documentation, do not train the underlying model on your tenant data. Consumer plans carry no such protections. Your access controls determine what the AI can see; getting them right before you connect is the step that makes enterprise AI tools safe to use.

When is a lighter-touch approach good enough?

Regulators take a risk-based approach. The ICO does not require a DPIA for every AI experiment; the trigger is processing “likely to result in a high risk” to individuals, such as profiling, large-scale data use, or automated decisions with significant effects. Using AI to summarise internal notes containing no personal data sits well below that threshold. Applying full governance to such tasks would slow you down without making anyone safer.

The dividing line is usually clear in practice. Using AI to generate a first draft from your own notes? A brief check that no client names have slipped in is enough. Using AI to profile prospects from your CRM, match clients to services, or automate onboarding communications? That sits in different territory, and the ICO’s automated decision-making guidance applies.

One counterpoint worth keeping in mind: thorough data governance does not guarantee AI projects will deliver commercial value. The National Audit Office has repeatedly found that unclear business objectives and poor change management are equally common causes of digital and AI project failure. Getting your data right is a prerequisite. Solid data alongside unclear objectives will still produce a project that misses.

What else connects to data readiness?

Several concepts come up repeatedly once you start working on data readiness, and understanding them early prevents confusion. Role-based access control (RBAC) means assigning data access by job role rather than individual. Data minimisation means collecting and retaining only what you genuinely need, a key ICO expectation. The EU AI Act is adding compliance timelines that UK firms selling into Europe need to plan for.

RBAC matters beyond AI specifically. Your CRM, cloud storage, and project management tools should all have explicit role assignments so that new starters get access to what their role requires, and departing contractors lose it cleanly. AI amplifies the importance of this because AI tools synthesise information across everything they can reach. The narrower the access, the lower the risk.

Data minimisation is a principle in UK GDPR rather than a preference. Collecting less personal data than you think you need is almost always the right call. It reduces your ICO exposure, simplifies the DPIA process if you go down that road, and means AI tools drawing on your data focus on what actually matters rather than generating inferences from data you never intended to use.

The EU AI Act sets requirements for organisations operating within the EU or supplying cross-border. UK firms are not directly subject to it post-Brexit, but those working with EU clients or selling into European markets may face its data governance obligations for high-risk AI systems. Core provisions apply 24 months after entry into force, with further requirements phased over 36 months. If you are growing cross-border, building your data governance now is simpler than retrofitting it later.

A data-readiness exercise pays twice: it makes your AI tools more reliable today, and it de-risks your business against a regulatory landscape that is still taking shape. The four steps, inventory, clean, document, and connect, can be completed in a few focused days for a typical services firm. Start with the data your first AI use-case will actually touch, get that in order, and expand from there.

Sources

- UK Government, Central Digital and Data Office (2024). Artificial Intelligence Playbook for the UK Government. Sets out governance, security, and compliance principles for safe AI deployment, directly relevant to SME data handling frameworks. https://www.gov.uk/government/publications/ai-playbook-for-the-uk-government/artificial-intelligence-playbook-for-the-uk-government-html - UK Government, Central Digital and Data Office (2024). Guidelines and best practices for making government datasets ready for AI. Defines the four-pillar AI-readiness framework and mandates role-based access control, encryption, and documentation as essentials. https://assets.publishing.service.gov.uk/media/696e43965a37ab534a9e23ac/Building_AI-Ready_Datasets_for_the_UK.pdf - Information Commissioner's Office (2023). Guidance on AI and data protection. Covers lawful basis, DPIAs, data minimisation, and fairness obligations for organisations using AI to process personal data. https://ico.org.uk/for-organisations/guide-to-data-protection/key-dp-themes/guidance-on-ai-and-data-protection/ - Information Commissioner's Office (2023). Data Protection Impact Assessments (DPIAs). Explains when a DPIA is required, including for high-risk AI processing such as profiling and automated decision-making. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/accountability-and-governance/data-protection-impact-assessments/ - National Cyber Security Centre (2023). Secure use of AI as a service. Advises on risks of data submitted to online AI tools and recommends reviewing provider policies before sharing sensitive business data. https://www.ncsc.gov.uk/collection/guidance-for-securing-ai/secure-use-of-ai-as-a-service - National Audit Office (2023). Good practice guide for organisations using AI. Practical checklist for boards and leaders on data provenance, governance, human oversight, and common AI project failure modes. https://www.nao.org.uk/insights/good-practice-guide-for-organisations-using-ai/ - Financial Conduct Authority and Bank of England (2023). Machine learning in UK financial services. Survey of 73 UK financial firms found data quality and availability are among the top barriers to safe AI and ML deployment. https://www.bankofengland.co.uk/report/2023/machine-learning-in-uk-financial-services - Information Commissioner's Office (2020). ICO fines British Airways £20m for data breach. Illustrates the regulatory consequence of poor data security practices in digital systems, including the scale of financial exposure. https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2020/10/ico-fines-british-airways-20m-for-data-breach/ - EUR-Lex (2024). Regulation of the European Parliament and of the Council laying down harmonised rules on Artificial Intelligence (AI Act). Sets out compliance timelines and data governance obligations for high-risk AI systems, relevant for UK firms operating cross-border. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206 - Microsoft (2024). Overview of privacy, security, and safety in Copilot for Microsoft 365. Documents that enterprise plans do not train underlying models on tenant data and inherit existing permissions structures. https://learn.microsoft.com/en-gb/microsoft-365-copilot/privacy-and-protections

Frequently asked questions

Do I need to carry out a DPIA before using AI tools in my business?

Not necessarily. The ICO requires a DPIA when AI processing is "likely to result in a high risk" to individuals, which includes profiling, large-scale personal data use, and automated decisions with significant effects. For lower-risk uses, such as using AI to draft internal documents with no personal data, a DPIA is not required. The key is to assess the risk level of each use-case rather than applying the same process to every experiment.

Is it safe to paste client information into a business AI tool like Microsoft Copilot?

Enterprise plans for tools such as Microsoft Copilot for Microsoft 365 are designed so that your data is not used to train the underlying models, and the tool inherits your existing permissions structure. Consumer AI tools such as free versions of ChatGPT do not carry the same protections, and the NCSC recommends reviewing provider policies carefully before sharing any sensitive business data with them.

What is the biggest data preparation mistake small firms make when adopting AI?

Treating it as a one-off exercise. Cleaning your data before an AI pilot will improve results initially, but without sustained ownership and clear role assignments, data quality degrades quickly. Both UK government guidance and ISO/IEC 8183 frame data readiness as a continuous lifecycle rather than a project with a defined end date. Assigning clear data ownership per dataset is the most direct way to prevent drift.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation