AI-edited vs AI-drafted, the evaluation thresholds differ

AI-edited or AI-drafted, the evaluation thresholds are not the same

May 11, 2026

TL;DR

AI-edited means a human wrote it and a tool polished it, the human owns the facts and the only risk worth checking is regression. AI-drafted means the tool wrote the first version and the human edited or accepted it, the model now owns the facts and every claim has to be verified. Treating the two as one category leads to systematic over-trust in drafted material and under-credit of edited material.

Key takeaways

- AI-edited and AI-drafted are different activities with different risk profiles. Edited content carries regression risk, where a polish flattens nuance or weakens a claim. Drafted content carries invention risk, where the model fabricates facts, sources or quotations the human has to catch. - Labels matter. Replace the catch-all "AI-assisted" with two explicit terms, AI-edited when a human wrote first, AI-drafted when the model wrote first. The label dictates which evaluation threshold applies. - The three-question check surfaces which path was used. Did a human write this before AI touched it. If AI wrote first, what was the human's role afterwards. Which factual claims in this piece require independent verification. - High-stakes work should almost never be AI-drafted without expert verification. Pangram Labs' detection research and Stanford HAI's 2024 AI Index both document hallucination rates high enough that uncritical publication erodes credibility over time. - The cost of conflating the two paths is asymmetric. Owners systematically over-trust drafted material because the prose looks finished, and under-credit edited material because the time spent reads as inefficient. Both judgements are wrong.

The owner I am thinking of caught it in a phrase. Her marketing lead had been describing everything that came out of the team as “AI-assisted”, and the owner had let the term slide for a few months. Then she sat with two pieces side by side. One was a memo her senior consultant had written and run through a polishing tool. The other was a 1,200 word LinkedIn article a junior had generated from a single prompt and lightly edited. The team called both “AI-assisted”. The owner could see they were not the same thing at all, and that the second one needed a different kind of review the team had not been doing. She asked me whether the distinction was worth making a rule about.

It is. The two paths produce different output, carry different categories of risk, and demand different evaluation thresholds. Treating them as one category leads to predictable failures, and the fix is small enough that an owner can put it in place over a single team meeting.

What is the difference between AI-edited and AI-drafted writing?

AI-edited writing means a human wrote the first version and an AI tool polished or tightened it afterwards. AI-drafted writing means the AI produced the first complete version from a prompt and a human then accepted, edited or rewrote what it generated. The difference matters because in the first path the human owns the facts, and in the second the model does until verified.

In the edited path, the human has already committed to a factual claim by writing it down, and the AI is operating on existing prose. In the drafted path, every claim in the output came from the model’s pattern matching across its training data, and the human reading it has no way to know which claims are anchored in real sources and which are confident fabrications. Stanford HAI’s 2024 AI Index documents hallucination rates on frontier models that vary by task but remain meaningful at scale. Pangram Labs’ detection work confirms that the distinction between edited and drafted prose is not reliably caught by automated tools either. The label has to come from the workflow itself.

Why does the distinction matter for the evaluation threshold?

The two paths carry different risk profiles, so the human review that catches each one looks different. AI-edited content carries regression risk, where a polish accidentally weakens a claim, strips a qualifier or softens the author’s voice. AI-drafted content carries invention risk, where the model fabricates statistics, sources or quotations with the same fluency as it cites real ones. The thresholds need to be calibrated to each.

A sentence like “the supplier could meet the deadline if additional resources were available” gets tightened by an editing tool to “the supplier could meet the deadline”, and the conditional clause that changes the meaning is gone. The check is a side-by-side reading that takes 5 to 10 minutes per piece, focused on whether the specifics survived the edit. AI-drafted content is heavier. A 1,500 word piece may contain 15 to 25 factual claims, each capable of being a hallucination. The Harvard Business Review’s editorial guidance on AI-generated content notes that verification often takes longer than writing the piece manually, because the human cannot trust their own knowledge and has to check every significant claim against an external source. NIST’s AI Risk Management Framework places the responsibility for that verification on the deployer of the system, and the ICO’s UK GDPR guidance does the same for any output that touches personal data.

Where do owners actually meet this distinction in daily work?

You meet it at the points where AI output crosses a boundary, into a client’s inbox, into a published article, into a proposal, into a forecast that shapes a hiring decision. The boundary is the same in both paths, but the question being asked at the boundary is different. For edited content the question is whether the edit preserved meaning. For drafted content the question is whether the model invented anything.

The conflation problem usually shows up the same way. A team uses ChatGPT to draft a piece and Grammarly to clean it up. Both are AI tools, so the output gets filed as “AI-assisted” and runs through a single review gate. That gate is almost always calibrated for the lighter task. The drafted material slips through with no verification of the claims it contains, and the edited material gets reviewed too aggressively, with the team spending time hunting for invented facts that the human author never created in the first place. The asymmetry is invisible until someone reads two pieces back to back and sees what the single label has been hiding.

When does each path need a heavy threshold and when does a light one work?

It depends on the stakes of the piece. High-stakes content is anything that could damage credibility, affect a client relationship, create regulatory exposure or speak on behalf of a named individual or the firm. Thought leadership, client proposals, public statements and regulatory communications all sit there, and AI-drafted material in this band should not enter publication without expert verification of the primary factual claims.

AI-edited material in the high-stakes band needs the regression check and a voice consistency review, because the factual foundation is the human author’s already. Medium-stakes content includes client background documents, technical memos for internal review, training materials and proposal appendices. For these, AI-drafted material is acceptable if the subject-matter owner has confirmed the major factual claims are reasonable and current. AI-edited material gets the regression check on its own. Routine content includes social captions where no specific factual claim is being made, internal scheduling notices and refreshed evergreen content. For these, AI-drafted material is acceptable if it is verified once for accuracy and then reused in standard form, with a quarterly spot check for outdated references. The point of the bands is that the team knows which threshold applies before the piece is produced, not after it has gone out.

What is the three-question check that surfaces which path is in use?

Three questions, asked of any piece before it enters the approval gate, separate the two paths reliably. Did a human write this before AI touched it. If AI wrote the first version, what was the human’s role afterwards, minor edits, major rewrites, or accepting most of it as-is. What are the factual claims in this piece, and does the person who approved it have direct knowledge of the source material.

The answers to those three questions tell a marketing lead which label to attach and tell the owner reading the label which review depth to expect. The team rule that holds it all together is simple. Drop “AI-assisted” as a category. Use AI-edited when the human is the primary author and AI was used for refinement. Use AI-drafted when AI produced the first substantial version. Attach the label to each piece at the point of approval. The labels make the right evaluation threshold visible at the moment it matters.

If you want to talk through how to embed that discipline in your own team’s operating rhythm without it becoming bureaucratic, book a conversation.

Sources

- Stanford HAI (2024). 2024 AI Index Report, the state of AI hallucination and benchmark performance. Cited for the documented frequency of factual errors in current frontier language models and the limits of generic detection tools. https://aiindex.stanford.edu/report/ - National Institute of Standards and Technology (2024). AI Risk Management Framework, generative AI profile. Cited for the deployer's responsibility for output verification and the recommendation that human review burden scales with stakes. https://www.nist.gov/itl/ai-risk-management-framework - European Commission (2024). EU AI Act, Article 14 on human oversight and Article 50 on transparency for AI-generated content. Cited for the regulatory expectation that the entity deploying the AI is responsible for what it produces. https://artificialintelligenceact.eu - Information Commissioner's Office (2024). Guidance on AI and data protection. Cited for the deployer's accountability for AI output under UK GDPR, regardless of which vendor supplied the model. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - Pangram Labs (2024). AI text detection research and the limits of stylistic identification. Cited for the finding that the difference between AI-edited and AI-drafted text is not reliably caught by automated detection and has to be tracked by workflow labelling instead. https://www.pangram.com/ - MIT Sloan Management Review (2024). How AI changes the editing process, regression risk in machine-assisted writing. Cited for the framing of regression risk in AI-edited workflows, where automated polish can flatten specificity or strip qualifying clauses. https://sloanreview.mit.edu/article/the-new-rules-of-ai-augmented-work/ - Harvard Business Review (2024). Don't trust AI summaries, the verification gap in generative content. Cited for the editorial finding that AI-drafted content typically contains roughly one factual claim per 60 to 100 words, each requiring independent verification before publication. https://hbr.org/2024/05/dont-trust-ai-summaries - ICAEW (2024). AI in the audit, governance expectations for AI-generated work product. Cited for the professional-services framing that AI-drafted material entering a regulated communication requires subject-matter expert sign-off as a condition of release. https://www.icaew.com/technical/technology/artificial-intelligence - OECD (2024). AI principles, accountability and human oversight in deployment. Cited for the international policy consensus that the entity deploying AI bears responsibility for its output, independent of the model provider. https://www.oecd.org/digital/artificial-intelligence/

Frequently asked questions

How do I tell whether a piece is AI-edited or AI-drafted if my team has been ambiguous about it?

Ask the person who produced it three questions. Did you write a first version yourself before any AI touched it. If the AI produced the first version, did you make minor edits, major rewrites, or accept most of it. Which factual claims came from your knowledge and which came from the model. The answers settle the category in under two minutes and tell you which evaluation threshold applies.

Is AI-edited writing safer than AI-drafted writing?

Yes, in a specific sense. The human author has already committed the facts, sources and core argument to paper, so the model is operating on the surface of the prose. The risks are regression and voice drift, which a 5 to 10 minute side-by-side comparison catches. AI-drafted writing carries the harder risk of invention, where a 1,500 word piece can contain 15 to 25 factual claims that each need verification.

Can I let AI draft routine content and only fact-check the high-stakes pieces?

Yes, if you are honest about which is which. Social captions, internal meeting recaps and refreshed evergreen content tolerate AI-drafted material if it is verified once and then reused. Client proposals, regulatory communications, anything attributable to a named individual or your firm should not be AI-drafted without expert verification. The error pattern that hurts credibility is medium-stakes work that gets drafted as if it were routine.

Written by Dr Dave Heath, AI consultant and business strategist.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

AI-edited or AI-drafted, the evaluation thresholds are not the same

Key takeaways

What is the difference between AI-edited and AI-drafted writing?

Why does the distinction matter for the evaluation threshold?

Where do owners actually meet this distinction in daily work?

When does each path need a heavy threshold and when does a light one work?

What is the three-question check that surfaces which path is in use?

Sources

Frequently asked questions

How do I tell whether a piece is AI-edited or AI-drafted if my team has been ambiguous about it?

Is AI-edited writing safer than AI-drafted writing?

Can I let AI draft routine content and only fact-check the high-stakes pieces?

Ready to talk it through?

If any of this sounds familiar, let's talk.

AI-edited or AI-drafted, the evaluation thresholds are not the same

Key takeaways

What is the difference between AI-edited and AI-drafted writing?

Why does the distinction matter for the evaluation threshold?

Where do owners actually meet this distinction in daily work?

When does each path need a heavy threshold and when does a light one work?

What is the three-question check that surfaces which path is in use?

Sources

Frequently asked questions

How do I tell whether a piece is AI-edited or AI-drafted if my team has been ambiguous about it?

Is AI-edited writing safer than AI-drafted writing?

Can I let AI draft routine content and only fact-check the high-stakes pieces?

Ready to talk it through?

Related reading

Quality signals over time, how to spot when AI output is drifting

The two-person review threshold, when single-check AI evaluation is not enough

Sampling rates for AI output, what the volume should drive

If any of this sounds familiar, let's talk.