A services firm puts together a client briefing. One section summarises recent regulatory guidance, and three of the cited references are fabricated. ChatGPT generated the paragraph with complete confidence, no hedges, no uncertainty. The person who approved the document never searched for the sources, because the text looked right.
This is not a fringe scenario. OpenAI’s own guidance says users should not rely on model outputs as a sole source of truth, and recommends checking critical facts, including legal, financial and medical information, against authoritative sources before acting. For an owner-operated business using ChatGPT in client work, that means having a checking method: one that takes minutes rather than hours and that anyone in your team can apply consistently.
What does it mean to check a ChatGPT answer?
Checking a ChatGPT answer means separating what the model asserts from what you can actually verify. ChatGPT produces confident, well-formatted text whether its claims are correct or invented, and it does not signal the difference. OpenAI’s own guidance says users should not rely on model outputs as a sole source of truth and recommends checking critical facts against authoritative sources before acting.
The practical failure modes are three. Fabricated citations: the model produces references that do not exist, formatted correctly, with plausible-sounding titles. Outdated information: training data has a cutoff, and the model cannot know what changed after it. Wrong attribution: the model associates a real source with a claim that source does not actually make.
None of these failures announces itself. The text reads normally. The numbers are formatted. The source looks real. For a small services firm using ChatGPT to draft proposals, briefings, or client communications, this is the core risk: the output looks authoritative because the language is polished, not because the facts have been verified.
Why do wrong answers cause real damage?
For an owner-operated business, accountability for what goes to clients stays with you regardless of how the content was generated. The ICO is clear that organisations using generative AI remain responsible for accuracy obligations under UK data protection law. The FCA has made the same point for regulated firms: AI-generated outputs do not transfer a firm’s accountability for misleading or incorrect information to the model’s developer.
The Air Canada case from 2024 is the most cited illustration of what happens when this is ignored. A BC tribunal held Air Canada responsible after its chatbot gave a passenger a misleading bereavement fare explanation. The airline’s defence, that the chatbot was a separate entity with its own responsibility, was rejected. The business owned the output.
For small firms, the damage often lands before any regulatory action. A fabricated citation in a client-facing document damages professional credibility. A wrong number in a pricing summary creates a false expectation. A misquoted regulation in a compliance briefing creates liability exposure well before anyone files a complaint.
The ICO’s guidance adds a UK-specific dimension: if ChatGPT output contains or concerns personal data, the firm needs a lawful basis, an accuracy check, and a clear processing purpose. AI use does not remove UK GDPR obligations, and for any firm preparing materials that reference individuals, clients, or employees, this applies directly.
Where are you most likely to meet this problem?
The highest-risk outputs are the ones that look the most credible. ChatGPT is fluent and well-structured even when it is wrong, which means the errors most likely to cause harm appear in high-stakes documents that nobody double-checks precisely because they look authoritative. Client briefings, proposals with financial figures, regulatory summaries, and complaint responses are all high-risk categories.
For services firms specifically, four output types carry the most risk. Financial summaries where the model produces plausible-looking numbers without basis in the source data. Regulatory or legal references where the model invents citation-formatted claims about rules that do not exist. Research blending real and invented sources, presenting fabricated papers alongside genuine ones. Customer-facing explanations of products, policies, or processes where imprecision creates misleading impressions.
The NCSC identifies a parallel risk: if staff are pasting sensitive client or business information into public AI tools to produce these outputs, data exposure runs alongside accuracy risk. Protecting what goes into the prompt matters as much as verifying what comes out.
When should you run the full check, and when can you be lighter?
The checking method should be proportionate to the stakes of the output. A ChatGPT response used for internal brainstorming or rough-drafting before a subject-matter expert reviews it carries lower risk than text going directly to clients or regulators. The question to ask before checking is: who acts on this output, and what do they lose if it turns out to be wrong?
Four categories warrant the full sequence every time: customer-facing communications, materials with legal or financial implications, anything that cites external sources, and any output that will be treated as a recommendation rather than a starting point.
Two categories where a lighter approach is defensible: internal documents where the primary reader has the expertise to spot errors, and brainstorming outputs where factual precision is not the goal.
One counterintuitive point worth holding: if your firm already has strong editorial or quality controls, a formal checking sequence is additive rather than a replacement. The value of a named method is consistency. It means the review happens the same way every time, by everyone in the team, not only by the people who happen to be careful.
What does a practical checking method look like?
The three-check rule is a method any non-technical team can apply in under ten minutes per output. Check names and numbers against the best available primary source. Check whether the dates and facts are current. Assess whether the sources cited actually exist and whether they are authoritative on the specific point the model has used them to support.
The sequence runs in this order. First, underline every specific claim in the output: numbers, dates, citations, named organisations, regulatory references. Separate facts from opinions and recommendations. Second, check each factual claim against a primary source, your own documents, Companies House, a regulator’s website, contract terms, or the cited source itself if it exists. Third, cross-check important claims with one independent secondary source, a reputable trade publication, a professional body, or an industry database.
Reject or rewrite any claim you cannot source. The temptation is to leave uncertain content in and add a hedge. That approach is usually worse than removing the claim, because a hedged wrong answer still propagates the error.
For customer-facing materials, add one more step: sign-off from someone who can justify the claim from source documents, not from the AI output. The reviewer’s job is to confirm accuracy, not to judge whether the text reads well.
Keep a short audit trail. Store the prompt, the output, the sources checked, and the approved wording. This takes two minutes and means you can reconstruct the checking process if a client queries a claim later. The ICO’s guidance on accuracy and proportionate human oversight supports exactly this kind of documented approach.
If you want to apply this method across your business rather than just for individual documents, Book a conversation and we can work through where the verification gaps actually sit.



