Where your data goes when you paste it into a chatbot

A business owner at a desk with a printed client proposal beside her laptop and an open notebook in front of her
TL;DR

When you paste text into a consumer AI chatbot, the text leaves your browser, travels to the vendor's servers, gets tokenised, runs through a model on a GPU somewhere in the world, and then sits in a database under that vendor's retention rules. Two questions change everything: free tier or paid tier, and which provider. The controls available, and the legal exposure for a UK SME, depend on the answer to both.

Key takeaways

- Pasted text leaves your browser the moment you hit return. From that point, it sits on the vendor's infrastructure under their retention and training rules, not yours. - The two questions that change everything: which tier (free, consumer paid, business, enterprise), and which vendor. Defaults differ sharply and many owners get the worst of both by accident. - As of May 2026, ChatGPT free and Plus train on your conversations by default and store them indefinitely. Claude free and Pro do not train by default. Gemini and Copilot do not train consumer prompts. None of these consumer tiers give you a Data Processing Agreement. - Four categories of content should never enter a consumer-tier chatbot under any setting: live credentials, regulated personal data, board-confidential financial figures, and M&A information. The Samsung 2023 source-code leak is the textbook example of why. - Verify your own setup in three places before trusting any tool with anything sensitive: the training opt-out toggle, the retention setting, and the data residency region. If you cannot find all three in the account, treat the tool as off-limits for client data.

It is Friday afternoon. An owner is sitting with a coffee, three days after a busy Tuesday, and a thought arrives that will not leave. On Tuesday she pasted a draft client proposal into free ChatGPT to tighten the language before sending it. The proposal had the client’s name, the figures, and a paragraph that referenced a sensitive piece of context the client had only shared in confidence. She does not remember whether she was logged into a paid account or a free one. She does not know whether that text is now sitting in a database, on a server somewhere, being read by anyone, or training a future model. She would rather not think about it. She is thinking about it anyway.

This is the position a meaningful share of owners are sitting in right now. The chatbots are useful, the chatbots are fast, and the question of what actually happens to the text after the send button has never quite been answered in plain English. The answer matters, because the controls available depend on it, and the controls vary widely between providers and tiers.

What actually happens when you paste text into a chatbot?

The text leaves your browser the moment you hit return. It travels over an encrypted connection to the vendor’s servers, gets tokenised into numerical fragments the model can read, runs through the model on a GPU somewhere in their infrastructure, and the reply comes back. Both prompt and response are stored in a conversation database the vendor controls. The round trip takes seconds. The data is now under their retention rules, not yours.

The first thing to absorb is that the browser is not doing the work. The model is not on your laptop. Every byte of the prompt has crossed the public internet, landed on the vendor’s GPUs, and been written to a database before you see the reply. The second thing is that storage is a deliberate feature of the product. Conversation history is what lets the chatbot remember the last thing you asked. It also means a regulator asking what data left your firm last Tuesday can be answered, but only by the vendor, and only on the vendor’s terms.

Why does this matter for your business?

UK GDPR and the Data Use and Access Act 2025 do not stop applying because the data sits inside a chatbot prompt. If you process personal data of clients, staff, or customers, you are the controller. Pasting that data into a tool that trains on your conversations is processing without a lawful basis. The ICO has been explicit since 2023 that AI tools sit inside its enforcement remit.

The exposure scales with what is in the prompt. A draft blog post is one thing. A client proposal with named figures is another. A board pack with an acquisition target is in a different category entirely. Cyberhaven’s 2026 research found that 39.7% of AI interactions in surveyed organisations involve sensitive data, and 32.3% of ChatGPT use happens through personal accounts that bypass any enterprise controls the firm has set. That is the median pattern in working firms today.

The platform is not your data processor unless you hold a Data Processing Agreement with them, and consumer tiers do not provide one. Without a DPA, every prompt containing client data is a fresh exposure to an ICO complaint or a client breach claim.

Where will you actually meet it: the four tiers that change everything

Free and consumer-paid tiers (ChatGPT Plus, Claude Pro, Gemini Advanced, Copilot Pro) are aimed at individuals. Business tiers (ChatGPT Business, Claude Team, Gemini for Workspace, Microsoft 365 Copilot) are aimed at firms. Enterprise tiers add data residency, audit logs, and a contractual DPA. The defaults flip sharply between consumer and business tiers, and that flip is where most of the legal exposure for an SME sits.

ChatGPT free and Plus train on your conversations by default and store them indefinitely on US infrastructure. The opt-out exists, but it is buried in settings at privacy.openai.com and many users have never found it. Claude free and Pro, since Anthropic’s October 2025 policy change, do not train on consumer chats by default and retain them for thirty days. Gemini consumer tiers do not train on prompts but auto-delete activity after eighteen months. Copilot Pro does not train but offers limited residency control. None of these consumer tiers give you a DPA, so for any prompt containing personal data they are technically off-limits if you are GDPR-bound.

Business and enterprise tiers reverse the defaults. ChatGPT Business, Claude Team, Gemini for Workspace, and Microsoft 365 Copilot all default to no-training, configurable retention, and a contractual DPA. Microsoft 365 Copilot and Gemini for Workspace also offer EU Data Boundary, meaning the data stays inside the EU. These are the tiers a UK SME actually needs if it is putting any client or staff data into a chatbot. The decision is covered in more detail in the paid LLM tier data risk decision, which walks through the £150-a-month threshold that usually flips the calculation.

When to ask vs when to ignore: the four content categories that should never enter a chatbot

Live credentials, regulated personal data, board-confidential financial figures, and M&A information should not be pasted into any consumer-tier chatbot regardless of opt-out settings. The opt-out only controls future training. It does not control who at the vendor can read the conversation during inference, and it does not undo data already used in a completed training run. For these four categories, the right question is whether the tool needs to see the data at all.

The Samsung incident in 2023 is the canonical case. Employees pasted proprietary semiconductor design specifications and internal source code into free ChatGPT to debug them. That code became part of OpenAI’s training corpus. The data could not be recalled. Samsung temporarily banned ChatGPT internally, then reintroduced it with strict classification rules. The same exposure applies in miniature for an SME pasting a client’s regulated personal data, a partner’s compensation figures, or the name of an acquisition target into a consumer chatbot. Once the prompt is sent, the only recovery is the vendor’s deletion policy, and deletion does not reach the trained model. Categorising data before it goes near a chatbot is covered in the four-tier data classification for AI, which gives the working SME version of the policy.

Everything else, marketing copy, anonymised process descriptions, public-domain background research, the wording of a generic email, is fine in any tier with the opt-out set correctly. The category line, not the tool, is what matters.

How to verify your own setup in three places

Before you trust any chatbot with anything sensitive, find three settings in the account: the training opt-out, the retention period, and the data residency region. If you cannot find all three, the tool is not configured for business use. The training opt-out controls a future model. Retention controls how long the conversation sits in the database. Residency controls which jurisdiction’s laws apply to the data right now.

For ChatGPT, the training opt-out is at privacy.openai.com and the residency setting only exists on Business and Enterprise plans. For Claude, the training toggle sits under Settings, Privacy, “Help improve Claude”, and residency control only exists on Enterprise. For Gemini, activity controls and the EU Data Boundary toggle both live in the Workspace admin console. For Microsoft 365 Copilot, residency and audit logging are configured in the Microsoft 365 admin centre under Copilot settings.

The setting that matters most for a UK SME is residency. Data residency determines whether your data is subject to US government access under the CLOUD Act, or stays inside the UK or EU under GDPR. For any tool you intend to use with client data, that setting is non-negotiable, and it only exists on the business and enterprise tiers.

If you are looking at the chatbot landscape from the inside of a small firm and you are not sure which tier you are on, which provider you should be on, or what to do about the prompts that have already gone out, book a conversation.

Sources

- OpenAI (2026). ChatGPT Data Residency and Inference Residency Help Center. Tier-by-tier residency options and inference geo controls. https://help.openai.com/en/articles/9903489-data-residency-and-inference-residency-for-chatgpt - OpenAI (2026). How to turn off model training. Official opt-out path for ChatGPT free and Plus. https://help.openai.com/en/articles/8983082-how-do-i-turn-off-model-training-to-stop-openai-training-models-on-my-conversations - Anthropic (2025). Updates to our consumer terms. The training-default flip and 30-day retention rule. https://www.anthropic.com/news/updates-to-our-consumer-terms - Anthropic (2026). Claude data residency documentation. Workspace-level geo controls and Zero Data Retention for API. https://platform.claude.com/docs/en/manage-claude/data-residency - Google (2026). Generative AI in Google Workspace privacy hub. Workspace data handling, EU Data Boundary, and admin retention settings. https://knowledge.workspace.google.com/admin/gemini/generative-ai-in-google-workspace-privacy-hub - Microsoft (2026). Microsoft 365 Copilot privacy documentation. Data residency, EU Data Boundary, audit logging, and no-training commitments. https://learn.microsoft.com/en-us/microsoft-365/copilot/microsoft-365-copilot-privacy - Information Commissioner's Office (2025). Guidance on AI and data protection. The UK regulator's view on lawful basis, transparency, and DPIA triggers for AI use. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - Cyberhaven (2026). Sensitive data flowing into AI tools. Research finding 39.7% of AI interactions involve sensitive data and 32.3% of ChatGPT use is via personal accounts. https://www.cyberhaven.com/blog/sensitive-data-flowing-into-ai-tools - SamMobile (2025). Samsung lets employees use ChatGPT again after secret data leak. Coverage of the 2023 source-code leak that led to internal restrictions. https://www.sammobile.com/news/samsung-lets-employees-use-chatgpt-again-after-secret-data-leak-in-2023/ - The Hacker News (2023). OpenAI Redis bug behind ChatGPT chat history leak. Cross-user data exposure incident from 20 March 2023. https://thehackernews.com/2023/03/openai-reveals-redis-bug-behind-chatgpt.html

Frequently asked questions

If I'm using free ChatGPT for client work, what's actually happening to that data?

It is going to OpenAI's US infrastructure, being stored indefinitely, and by default it is being used to train future ChatGPT models. There is no Data Processing Agreement covering you, so under UK GDPR you are exposed if any client personal data is in those prompts. The opt-out exists at privacy.openai.com, but it does not apply retroactively to anything already pasted in.

Is Claude actually safer than ChatGPT for an SME?

For training behaviour, yes, since late 2025. Anthropic flipped its consumer tiers so chats are not used for training by default. Retention is 30 days rather than indefinite. But Claude does not offer UK or EU data residency on consumer tiers, so if you process EU personal data and need GDPR-compliant residency, Claude consumer tiers do not give you that. Microsoft 365 Copilot and Gemini for Workspace do.

I have already pasted sensitive things in. What now?

You cannot extract data that has already been used in a completed training run. The model is trained, the data is in the weights, and there is no way to remove it. What you can do: delete the conversations now so they are not used in future training runs, change the training opt-out so new conversations are not used, rotate any credentials that were ever pasted, and write down a short policy so the next person on the team does not repeat the mistake.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation