A founder I spoke with last month sent a client email she had drafted in ChatGPT and lightly polished. It contained a specific number, a market-share figure, that she had not asked for, had not verified, and that turned out to be invented. The client noticed. She is still slightly mortified about it, and rightly so. The email read perfectly. That was the problem.
This is the discipline the AI-on-your-desk conversation quietly demands you sharpen, not soften. AI output looks plausible. The editor’s eye is the bit that still has to be yours.
What is the editor’s eye, and why does AI need it?
The editor’s eye is the small set of habits a careful writer applies to any draft before it leaves their desk: check the specifics, listen to the tone, ask whether each claim earns its place, read once aloud. It is what fact-checkers at The New Yorker and the FT have done for decades. The reason AI needs it is that AI is now fluent enough to disguise its own errors as competent prose.
That is a different problem from earlier machine output. A clumsy AI draft was its own warning. A fluent AI draft is not. Operators who get burned are usually the ones who treated fluency as a proxy for accuracy. The Vectara Hallucination Leaderboard’s 2026 dataset, run across 7,700 articles, puts even the best summarisation model at a 3.3 percent hallucination rate, with named reasoning systems including Claude Sonnet 4.5, GPT-5, Grok-4, and Deepseek-R1 all sitting above 10 percent. Stanford’s HELM benchmark and Anthropic’s published system cards point the same way.
Why does it matter for your business?
Because the consequences of plausible-but-wrong scale with the seriousness of the artefact. The canonical example is Mata v. Avianca, where two New York attorneys filed a brief in 2023 containing six fabricated court citations generated by ChatGPT. Judge Castel sanctioned them, fined the firm 5,000 dollars, and ordered them to send the false affidavit to the judges whose names had been misappropriated in the invented opinions.
The pattern has now spread well beyond that one case. The Charlotin AI Hallucination Cases Database tracks over 50 US court cases and a growing UK High Court list in which parties were found to have relied on hallucinated content. Deloitte Australia delivered a 440,000 dollar government report containing fabricated academic references and a misattributed quote from a Federal Court judge, and offered a partial refund. GPTZero, analysing 4,000 NeurIPS 2025 papers, found over 100 hallucinated citations across 50 papers that had each cleared three to five expert reviewers. The implication for an SME founder is direct. If peer-reviewed academic conferences and named consulting firms are missing AI-fabricated specifics at scale, an unaided 10pm review of a Tuesday client email is not going to catch them either.
UK regulators have been explicit about what is now expected. The ICO’s accuracy guidance under UK GDPR, the FCA’s Mills Review of AI in retail financial services, the ICAEW’s audit-work guidance, and the Bar Council’s 2025 update all converge on the same point: meaningful human review of AI output is non-negotiable in regulated contexts, and organisations are expected to have documented their oversight before something lands wrong.
Where will you actually meet it?
You will meet it as one of three predictable failure modes, each of which passes a casual first read. The first is invented specifics: a confidently formatted citation, a precise statistic, a dated quote, all wearing the texture of authentic professional writing because the AI has learned the shape of citations more thoroughly than their substance. Mata’s fabricated cases included docket numbers and reporter pages. They looked entirely real until counsel checked the database.
The second is missed nuance, which is the more insidious of the three. AI tends to compress genuine disagreement, jurisdictional variation, or live evidence into a single confident narrative. A draft client advisory that quietly resolves a debate the field has not actually resolved commits you to a position you may not have taken unaided. A board memo that reads as clean consensus when the underlying evidence is split is the same failure mode in a different format.
The third is tonal drift, often toward a flattened North American business register. The Max Planck Institute, studying 740,000 hours of content, has documented a measurable rise in ChatGPT’s preferred vocabulary in everyday writing since 2023, including the words a USC writing-variation study tracked under the same heading. The em dash has become a strong enough AI tell that some careful writers now self-consciously avoid it. For a UK owner-operator writing in British English to UK clients and regulators, drift toward Americanised flat reads as carelessness, whether or not the content is accurate. It signals to the reader that nobody finished the job.
When should you polish, and when should you throw it out?
A polish pass is the right move when three conditions hold. The claims are verifiable from sources you already have to hand. The tone reads as yours within one read-aloud. Nothing in the draft commits you to a position you would not have taken unaided. If those three are clean, the draft is a starting point worth keeping.
The four-pass review is the working tool. Claims, verify each specific against a primary source. Nuance, read once aloud and listen for false certainty or compressed disagreement. Tone, ask directly whether this sounds like you in British English. Structure, ask whether the draft earns its length, or whether the model has padded the middle to look thorough.
You throw it out when the verification pass turns up two or more invented specifics, when the read-aloud produces a flatness you cannot localise to one paragraph, or when the draft has resolved a nuance you were not yet ready to resolve. Throwing out is faster than rewriting from a corrupted base. The judgement question that catches the in-between cases is this: would a thoughtful peer who reads my work catch that this is not me. If the answer is yes, the draft is not ready. The discipline overlaps with the practice of drafting first passes with AI. The first-pass habit and the editor’s eye are the two halves of the same skill.
How does the editor’s eye build over time?
It compounds. The first ten reviews are slower because you are building a personal catalogue of your own AI’s failure patterns: which kinds of citation it tends to fabricate, which words signal tonal drift in your voice, which structural moves it defaults to when it does not know what you want. By the thirtieth review, the catalogue is fast and largely automatic, and you are spending the time on substance rather than on detection.
The supporting habits are mechanical. Read aloud, because the auditory cortex catches rhythm and false certainty that silent reading does not. Verify every quantitative claim against a primary source before sending; if you cannot find the source in 60 seconds, the claim is not solid enough to ship. Run a brief premortem on regulatory or client-facing documents, the technique Gary Klein published in HBR in 2007: assume the document will be scrutinised and identify in advance which claims are most likely to be questioned, then verify those first.
The voice work is the part that takes longest, because it is the part the AI is best at imitating shallowly. A useful test, taught in the Roy Peter Clark “Writing Tools” tradition, is to read the draft aloud and listen for sentence-length variety. Frontier models default to a metronome rhythm that drones once you hear it. Your own voice does not. The wider category context for this discipline lives in AI for your own work, not just your business, the cluster pillar. The editor’s eye is the foundational discipline that makes everything else in personal AI practice safe to ship, which is the part of the conversation that does not get talked about enough.



