How to build a simple internal AI testing sandbox

Founder at a desk reviewing information on a laptop in a small, naturally lit office
TL;DR

Building an internal AI sandbox means setting up a separate, controlled environment where your team can test AI tools and workflows without risking live client data or production systems. For a small UK services firm, the simplest version is a dedicated cloud project with access controls, data minimisation, and logging. UK GDPR applies in test environments exactly as it does in production, so governance sits alongside the technology from day one.

Key takeaways

- An internal AI sandbox is a separate, isolated environment where your team can test AI tools and workflows without touching live client data or production systems. - UK GDPR applies to AI testing just as it does to production use. "Test environment" is not a compliance exemption, and the ICO has reprimanded organisations for using live personal data without adequate controls in test contexts. - For a 5-50 person services firm, the simplest starting point is a dedicated cloud project, such as a separate Azure subscription with Azure OpenAI, configured so your data stays out of model training and separate from your production workloads. - Four controls are non-negotiable regardless of which technical approach you choose: network isolation from production, role-based access with MFA, data minimisation using pseudonymised or synthetic data, and logging of who accessed what and when. - A sandbox needs governance alongside the technology: a named AI sponsor, a data protection lead, an acceptable-use note for anyone with access, and a fixed review date, typically eight to twelve weeks into the pilot.

A founder running a six-person legal services firm described her situation clearly when we spoke last month. She had been watching AI demos for almost a year and hadn’t started testing yet. The hold-up was straightforward: she had no safe place to try things.

That gap is common among small professional services firms. The interest in AI is real. The use cases are often obvious. What’s missing is a contained environment where experiments can’t accidentally touch client data, disrupt live workflows, or create a compliance problem nobody had planned for.

That environment has a name. It’s called a sandbox.

What is an internal AI testing sandbox?

An internal AI testing sandbox is a separate environment, usually a distinct cloud project or access-controlled workspace, where staff can trial AI tools, prompts, and workflows without touching live systems or client data. It sits disconnected from your production databases, CRMs, and payment systems, a walled-off section of your digital estate where experiments can run and be stopped cleanly if something doesn’t behave as expected.

The UK government uses exactly this pattern. Its NayaOne AI Sandbox lets public bodies and regulators test models in a secure environment that does not connect to government or regulator production networks. The same principle scales down to a five-person consultancy. Keep the test environment separate, control what data flows into it, and maintain a record of what gets tested and by whom.

For many small services firms, a sandbox doesn’t require specialist infrastructure. It can be as simple as a separate Microsoft 365 tenant, a dedicated Azure resource group, or a distinct cloud project with its own permissions. The defining features are isolation from production, limited and documented access, and a log of activity inside it. The UK AI Safety Institute’s Inspect toolkit follows the same logic, using provisioned containers to run AI agent evaluations so that any errant code behaviour stays isolated from surrounding systems.

Why does it matter to your business?

Testing AI directly in your live environment is a compliance and security risk that small firms frequently underestimate. The ICO is explicit: “test” is not an exemption from UK GDPR. Any processing of personal data in a test context carries the same obligations as production. A sandbox is the structural answer to that requirement, not an extra precaution for overly cautious organisations.

The Ticketmaster UK case gives this a concrete shape. The ICO fined the company £1.25m in November 2020 after a customer support chatbot on its payments page was compromised, allowing card details to be harvested over several months. The ICO’s findings included a failure to implement adequate security measures, specifically the isolation and monitoring that would have caught the problem earlier. The failure was a specific, preventable gap between a third-party component and a live system.

The financial exposure from a poorly governed experiment can be significant even when incidents stay internal. IBM’s 2023 Cost of a Data Breach Report puts the average global breach cost at $4.45m, with 82% of breaches involving data held in cloud environments. A sandbox moves the most experimental work behind a proper boundary before problems have a chance to compound.

For firms in financial services, the FCA has been consistent on this point. A joint survey by the FCA and the Bank of England found that 72% of UK financial firms were using machine learning in development or production by 2022, and the regulator expects structured governance and oversight to apply from the pilot stage onwards.

Where will you actually build one?

For a 5-50 person services firm, there are three practical options at increasing levels of complexity. A vendor-hosted SaaS sandbox is the simplest starting point: a separate Azure subscription with Azure OpenAI Service, configured so your data is not used for model training, gives you an isolated environment with minimal setup. A competent IT partner can have this running within a day.

Microsoft states that data submitted to Azure OpenAI Service is not used to train OpenAI models and is logically separated per customer tenant. Configure the subscription as a distinct resource group from your production workloads, consistent with NCSC guidance on cloud environment separation. This is the appropriate starting point for prompt testing, document summarisation, and early workflow experiments.

If your experiments need custom integrations or open-source models, a self-hosted container environment becomes relevant. Platforms such as Northflank run AI workloads in microVM-backed sandboxes using Kata Containers and gVisor, providing hardware-level isolation between your test environment and other systems. A capable IT partner can set up a basic version within one to two weeks.

A third option, using microVMs specifically to isolate AI agents that write and execute code, applies only if you’re testing agent frameworks that run Python or shell commands autonomously. For the large majority of small services firms experimenting with document processing or client communication workflows, the vendor-hosted route is sufficient to start.

Whichever approach you choose, four controls apply: network isolation from production, role-based access through SSO and MFA, data minimisation using pseudonymised or synthetic data rather than full client files, and logging of who accessed the sandbox, when, and what requests were made.

When does a sandbox make sense, and when is it overkill?

A sandbox is worth building once you start routing your own data through an AI service via a custom integration or external API. If your entire AI experiment runs inside Microsoft Copilot with permissions your IT team has already set, a separate sandbox is probably unnecessary. Once you’re connecting AI to data your clients expect to stay private, a dedicated environment is the appropriate choice.

Three questions help frame the decision quickly. Could a staff member feeding the wrong file into this tool create a data breach? Could AI outputs end up in a client-facing deliverable without human review? Are you connecting to any AI service in a way your existing data policies don’t explicitly address?

If any of those answers is yes, build the sandbox first. If all three answers are no, a documented acceptable-use policy within your existing tools may cover your current experiments.

Set governance from the start rather than retrofitting it after a problem surfaces. Appoint a named AI sponsor, typically the founder or managing director, to sign off the objectives and risk appetite. Assign a data protection lead, whether in-house or outsourced, to review and approve the datasets used in the sandbox. Fix a review date, eight to twelve weeks is a practical pilot window, at which you decide explicitly whether to extend, tighten, or close the experiment.

What else do you need alongside it?

The sandbox is a technical control, and it only functions properly when governance sits around it. A short acceptable-use note for everyone with access covers the basics: no uploading of special-category data or cardholder data, no copying of AI outputs into client work without human review, and a clear requirement to report unexpected model behaviour. That document takes an hour to draft.

The ICO’s guidance on AI and data protection requires that a named controller be responsible for how personal data is processed in AI systems, including during testing. That person almost certainly exists in your firm already. Making the designation explicit is a short conversation, not a compliance project.

If your AI use case involves profiling, creditworthiness assessment, hiring decisions, or any processing likely to result in high risk to individuals, a Data Protection Impact Assessment is required under UK GDPR before or during testing. The ICO’s DPIA guidance sets out a clear framework. Early-stage sandbox experiments in small services firms rarely trigger this threshold, but knowing where the line sits is worth a few minutes with whoever handles your data protection.

Keep documentation proportionate to the stage of the experiment. A one-page architecture note showing how users reach the sandbox and where logs are stored, a list of allowed and prohibited data types, and a vendor contract with data processing terms are enough to begin. Add to it as the work develops.

The goal of a sandbox is to give your team a safe place to learn what AI can do for your business before any of that learning touches something you can’t undo. Building that structure before the first experiment is what separates a controlled pilot from a problem you didn’t plan for.

Book a conversation to think through what this would look like for your firm.

Sources

- UK Government / NayaOne (2024). AI Assurance Techniques: NayaOne's AI Sandbox. Describes the UK public sector's model for testing AI in an environment isolated from production and regulatory networks. https://www.gov.uk/ai-assurance-techniques/nayaones-ai-sandbox - UK AI Safety Institute (2024). The Inspect Sandboxing Toolkit: Scalable and Secure AI Agent Evaluations. Details the AISI's approach to containerised sandbox execution for evaluating advanced AI agents. https://www.aisi.gov.uk/blog/the-inspect-sandboxing-toolkit-scalable-and-secure-ai-agent-evaluations - Information Commissioner's Office (2023). Guidance on AI and Data Protection. Sets out UK GDPR obligations for organisations using AI, including in development and testing environments. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - Information Commissioner's Office (2020). ICO fines Ticketmaster UK Limited £1.25m for failing to protect customers' payment details. Illustrates the regulatory risk of inadequate isolation and monitoring when integrating third-party components into live systems. https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2020/11/ico-fines-ticketmaster-uk-limited-1-25million-for-failing-to-protect-customers-payment-details/ - National Cyber Security Centre (2023). Cloud Security Guidance. Covers secure configuration, environment separation, identity and access management, and logging as baseline controls for cloud workloads including test environments. https://www.ncsc.gov.uk/collection/cloud-security - IBM Security (2023). Cost of a Data Breach Report 2023. Reports the average global breach cost at $4.45m and that 82% of breaches involve data in cloud environments. https://www.ibm.com/reports/data-breach - Bank of England and Financial Conduct Authority (2022). Machine Learning in UK Financial Services. Survey finding 72% of UK financial firms reporting ML use in development or production, with regulators emphasising governance requirements even during pilots. https://www.bankofengland.co.uk/report/2022/artificial-intelligence-and-machine-learning-in-uk-fm - Information Commissioner's Office (2023). Data Protection Impact Assessments (DPIAs). Explains when a DPIA is required for high-risk AI processing, including in test environments. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/data-protection-impact-assessments-dpias/ - Microsoft (2024). Azure OpenAI Service Data, Privacy and Security. States that customer data submitted to Azure OpenAI Service is not used to train OpenAI models and is logically separated per tenant. https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy - Northflank (2024). How to Sandbox AI Agents. Describes microVM-backed sandboxing using Kata Containers and gVisor for hardware-level isolation of AI workloads. https://northflank.com/blog/how-to-sandbox-ai-agents

Frequently asked questions

Do I need completely separate IT infrastructure, or can access controls within my existing tools be enough?

You don't necessarily need separate physical infrastructure. For many small firms, a dedicated Azure subscription or a separate cloud project with its own access controls, data boundaries, and logging is sufficient. What matters is that the sandbox has no direct data path to your production systems and that access is managed independently of your everyday tools.

If I'm only using anonymised data in my sandbox, does UK GDPR still apply?

It depends on the quality of the anonymisation. The ICO distinguishes between true anonymisation, where re-identification is not reasonably possible and UK GDPR no longer applies, and pseudonymisation, where data can be re-identified with the right key and UK GDPR still applies in full. Data that looks anonymised is often pseudonymised in practice. Treat it with the same controls you would apply to personal data unless you have a formal anonymisation assessment in place.

How long should a sandbox pilot run before I make a decision?

Eight to twelve weeks is a reasonable window for many small firms. That is enough time to test two or three specific use cases, gather structured feedback from the staff using it, and assess whether the outputs are reliable enough to build on. At the end of the window, decide explicitly whether to shut it down, extend with tighter controls, or promote a workflow into a more formal production environment.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation