How to benchmark AI adoption in your firm

How to benchmark AI adoption across your teams and workflows

TL;DR

Benchmarking AI adoption means measuring AI use at workflow level across three dimensions: who uses it, what outcomes change, and what risks arise. UK adoption surveys range from 16% to 70% depending on definition, which illustrates why the framing matters. Set baselines before rollout, run a 30 to 60-day pilot, and pair productivity data with a one-page AI use policy before sharing any findings with clients or regulators.

Key takeaways

- Benchmarking AI adoption means measuring at workflow level, not just counting staff who say they use AI tools - UK adoption rates range from 16% (strategic deployment, DSIT) to 70% (embedded tool use, QuickBooks SME survey), so definition discipline is the first benchmarking decision - Set a pre-AI baseline for each workflow before rollout; without it, any claimed improvement is anecdotal - A proper benchmark covers three layers: adoption (who uses what, how often), performance (time saved, quality, volume), and risk (data handling, unverified outputs, compliance exposure) - UK firms in regulated sectors must satisfy ICO, NCSC and FCA expectations as part of any AI governance exercise, not as a separate compliance task

You ask your operations manager how the team is getting on with AI. She says it’s going well, people are using it regularly. You ask what has changed. She mentions a few things, quicker email drafts and cleaner meeting notes. You ask whether any of it has actually moved the needle on client turnaround times or error rates. She’s not sure. That gap, between people using a tool and a business improving because of it, is precisely what a benchmark is designed to close.

What does benchmarking AI adoption actually mean?

Benchmarking AI adoption means measuring how AI is actually used across specific workflows, how often, by whom, with what results, and at what risk. A proper benchmark tracks three layers, adoption (which tools, how frequently), performance (hours saved, error rates, turnaround times), and risk (data handling, unverified outputs, compliance exposure). The UK Government’s AI Playbook treats this as a structured governance activity, not a casual one-off survey.

The three-layer frame matters because tool access does not equal valuable use. A firm that has bought 40 Microsoft Copilot licences may have strong adoption on paper. Whether the work is faster, better, or safer is a different question entirely. Benchmarking is the mechanism that answers it. It starts with understanding which workflows are affected, then setting a pre-AI baseline for each one, and measuring against that baseline after AI is introduced. No baseline, no benchmark. Without pre-AI data on turnaround times, error rates, and workload volume per workflow, any improvement claim stays anecdotal. The UK Government’s AI Playbook makes this clear. Understanding and mitigating risks from AI tools requires documentation and review, not just tool access. That review process is the benchmark in operation.

Why does this matter for your business right now?

UK surveys show sharply different AI adoption rates depending on how adoption is defined. British Chambers of Commerce and Atos put active use at 54% of UK firms in March 2026. QuickBooks reported 70% of UK SMEs using AI regularly when embedded tools, those built into Microsoft 365 or accounting software, are counted. Meanwhile, DSIT research suggests only 16% of UK firms have made strategic AI deployments.

Each of those numbers is measuring something different. The 70% figure counts anyone whose tools include AI somewhere. The 16% counts firms that have made a deliberate, governed, workflow-level commitment. If your board or senior team asks where you are on AI, the answer changes completely depending on which definition you use. A benchmark forces the right definition. It makes the question answerable at the level that actually drives business outcomes, which workflows are better and by how much. It also gives you a credible answer if a client, investor, or partner asks how you’re using AI and whether your controls are proportionate. The difference between 16% and 70% adoption is essentially the difference between a benchmark and a headcount. One drives decisions; the other flatters them.

Where will you actually meet it in a services firm?

For a services firm with five to fifty people, the benchmark shows up the moment you try to answer a simple question. Is our AI use actually improving anything? Common workflow candidates include proposal drafting, customer support triage, meeting notes, payroll administration, and compliance document checks. Each of those has measurable inputs and outputs, which is what makes them benchmarkable rather than just observable.

The other place benchmarking becomes relevant is when you realise you have embedded AI you have not deliberately chosen. Many staff are already using AI inside tools they have used for years, Microsoft 365, Google Workspace, accounting and project management software. Counting only standalone tools like ChatGPT misses a substantial part of the actual picture. A useful benchmark asks not just what tools have been bought, but which workflows have changed and how. The UK Government’s AI Playbook is explicit that a review should cover documented processes, escalation routes, and evidence of quality checks, not just tool installation. For proposal drafting, that means knowing whether proposal quality has improved and whether any errors have crept in from unverified AI output. For customer support triage, it means knowing whether resolution times have shortened and whether clients are satisfied with what they receive.

When should you benchmark, and when is it overkill?

Run a benchmark when you’re about to scale AI use beyond individual experimentation, when you need to show value to yourself or your team, or when a regulator might ask about your AI controls. Skip it if your firm’s AI use is purely personal productivity with no client-facing, financial, or compliance-relevant dimension. The test is whether AI touches a workflow that matters to the business.

The UK Government’s AI Playbook recommends a scan, pilot, scale approach. Pick one bounded workflow, run a 30 or 60-day pilot with clear controls and a defined endpoint, then review before scaling. That sequencing is the benchmark in practice. You identify the workflow, measure its pre-AI state, run the pilot, measure again, and decide. A kill criterion matters here. If quality or compliance deteriorates during the pilot, you need a rule for when to stop, not just a sense that something seems off. High usage with deteriorating quality is not a sign of success. One counterpoint worth holding. If your firm has no consistent process in the chosen workflow, AI may not be the right first lever. Process standardisation should come first, otherwise you are benchmarking chaos against slightly-assisted chaos.

What should you understand alongside this?

A benchmark gives you the data, but data only matters inside the right governance frame. If AI processes personal data, the ICO expects a lawful basis, data protection impact assessments where required, and human review of outputs. The NCSC highlights prompt injection and data leakage risks in staff tool use. FCA-regulated firms face additional requirements around customer outcomes and operational resilience when AI is embedded in processes.

Beyond the UK regulatory layer, two other points are worth holding. First, have a one-page AI use policy before measurement starts. It should state which tools are approved, what data is off-limits, who reviews outputs, and how incidents are escalated. High benchmark scores mean nothing if staff are pasting client data into public models without a lawful basis for doing so. Second, if your firm or vendors touch EU markets, the EU AI Act introduces risk-based obligations and transparency requirements on certain AI uses. The CMA has also published guidance on AI and competition, including the risks of vendor lock-in where a firm’s AI capability sits with a single supplier. Neither of these invalidates the benchmark approach. They add context to what you’re measuring and why the governance layer matters as much as the productivity layer.

If you want help working out where to start or what to measure in your own firm, the straightforward next step is a short conversation. Book a conversation and we can look at your specific workflows together.

Sources

- Spicy Advisory (2026). AI Adoption in UK SMBs: A Guide for 2026. Reports British Chambers of Commerce and Atos 54% active use figure (March 2026), QuickBooks 70% figure (January 2026), and DSIT 16% strategic deployment data; illustrates why definition discipline matters in benchmarking. https://spicyadvisory.com/blog/ai-adoption-uk-smb-guide-2026 - Afiniti (2025). AI Adoption Insights. Covers workflow-level measurement and the adoption, performance, risk framework for SME AI benchmarking. https://www.afiniti.co.uk/insights/ai-adoption/ - UK Government, Government Digital Service (2025). Artificial Intelligence Playbook for the UK Government. Sets out scan, pilot, scale sequencing, governance requirements, documented review processes, and escalation protocols for AI use in organisations. https://assets.publishing.service.gov.uk/media/67aca2f7e400ae62338324bd/AI_Playbook_for_the_UK_Government__12_02_.pdf - GDS Blog (February 2025). Launching the Artificial Intelligence Playbook for the UK Government. Policy framing and launch context for the Playbook, published by the Government Digital Service. https://gds.blog.gov.uk/2025/02/10/launching-the-artificial-intelligence-playbook-for-the-uk-government/ - Two Birds (2025). An AI Playbook for the UK Government has been released by the UK Government Digital Service. Legal commentary on scan, pilot, scale sequencing and governance obligations. https://www.twobirds.com/en/insights/2025/uk/an-ai-playbook-for-the-uk-government-has-been-released-by-the-uk-government-digital-service - ICO (2024). Artificial Intelligence and Data Protection. Guidance on lawful basis, data minimisation, DPIAs, security, and accountability obligations when AI systems process personal data. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - NCSC (2024). Guidance on Generative AI. Covers prompt injection risks, data leakage, access controls, and the requirement for human review of AI outputs in staff tool use. https://www.ncsc.gov.uk/guidance/generative-ai - FCA (2024). AI in Financial Services. Sets out FCA expectations on customer outcomes, operational resilience, outsourcing, and governance for regulated firms using AI. https://www.fca.org.uk/firms/ai - European Parliament and Council (2024). EU AI Act, Regulation (EU) 2024/1689. Risk-based obligations, prohibited practices, and transparency requirements for AI systems affecting EU market participants. https://eur-lex.europa.eu/eli/reg/2024/1689/oj - Competition and Markets Authority (2024). AI Fundamental Principles. CMA guidance on competition and AI, including vendor lock-in and market concentration risks in AI supply chains. https://www.gov.uk/government/publications/competition-and-markets-authority-ai-fundamental-principles

Frequently asked questions

How do I know if our AI use counts as strategic adoption or just casual use?

The distinction comes down to governance. Casual use means individuals choosing their own tools without a policy, baseline, or review process. Strategic adoption means specific workflows have defined AI use, documented outcomes, approved tools, and a human review step for outputs. The UK Government's AI Playbook sets out what a governed approach looks like in practice, including escalation processes and risk documentation.

What is the simplest baseline measurement we should set before running an AI pilot?

Pick two or three metrics that already exist in your workflow: average time to complete a task, error or rework rate, and volume processed per week. Capture those figures for two to four weeks before the pilot starts. You do not need sophisticated tooling. A shared spreadsheet works. The point is to have a pre-AI number to compare against, so improvement is evidence rather than impression.

Do we need to worry about data protection when benchmarking our AI adoption?

Yes, if your AI tools process personal data. The ICO expects you to identify your lawful basis, consider whether a data protection impact assessment is needed, ensure outputs are human-reviewed, and keep records of your approach. The NCSC guidance on generative AI also flags risks around staff pasting sensitive data into public tools. A one-page AI use policy specifying what data is off-limits is the minimum control before any benchmarking or scaling exercise.

Written by Dr Dave Heath, AI consultant and business strategist.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

How to benchmark AI adoption across your teams and workflows

Key takeaways

What does benchmarking AI adoption actually mean?

Why does this matter for your business right now?

Where will you actually meet it in a services firm?

When should you benchmark, and when is it overkill?

What should you understand alongside this?

Sources

Frequently asked questions

How do I know if our AI use counts as strategic adoption or just casual use?

What is the simplest baseline measurement we should set before running an AI pilot?

Do we need to worry about data protection when benchmarking our AI adoption?

Ready to talk it through?

If any of this sounds familiar, let's talk.

How to benchmark AI adoption across your teams and workflows

Key takeaways

What does benchmarking AI adoption actually mean?

Why does this matter for your business right now?

Where will you actually meet it in a services firm?

When should you benchmark, and when is it overkill?

What should you understand alongside this?

Sources

Frequently asked questions

How do I know if our AI use counts as strategic adoption or just casual use?

What is the simplest baseline measurement we should set before running an AI pilot?

Do we need to worry about data protection when benchmarking our AI adoption?

Ready to talk it through?

Related reading

AI theatre or real progress: how a founder tells the difference

How safe is AI for business use, and where do the risks sit?

How accurate is AI translation for business documents?

If any of this sounds familiar, let's talk.