Standard operating procedures AI can actually read

A woman at a kitchen table reading and annotating a printed standard operating procedure with a pen
TL;DR

Most SME standard operating procedures were written for humans who fill the gaps from experience. AI tools read only what is on the page and invent the missing pieces. Three format changes close almost all of the gap: explicit decision points, named inputs and outputs at every step, and a small sidebar of defined terms. None of them make the SOP worse for a human reader.

Key takeaways

- Human SOPs assume shared context that is almost never written down, which is exactly the context AI tools cannot supply for themselves. - Three format changes close most of the gap: explicit decision points using MUST and SHOULD language, named inputs and outputs per step, and defined terms in a sidebar. - The competence test is the cheapest validation, hand the SOP to someone who knows the work but not your business and watch where they pause. - Retrieval discipline matters as much as drafting discipline, one source location, a stable naming convention, and versioned updates with a change log. - Upgrade three to five high-volume, rule-based SOPs first and leave judgement-heavy or rarely-used procedures alone.

The owner asks the AI assistant to onboard a new client using the standard process. Twenty minutes later she comes back to a workflow that looks plausible, contains four steps that do not exist in her business, and skips the credit check entirely. The SOP it read was the one her operations manager wrote eighteen months ago. Three new starters have used it without incident. They knew what “verify the client” meant in her firm. The AI did not, so it invented a method that sounded professional and was wrong.

The AI was working from what was actually written down. The fix is small, three changes to format and one change to how completeness is checked, and the result reads better for humans too.

Why do AI tools struggle with human-written SOPs?

Human SOPs rest on a foundation of shared context that is almost never on the page. A new starter brings accumulated knowledge of the business and the customers, and fills hundreds of small gaps daily because they have been in the room long enough to absorb how the work runs. An AI system reads only what is written, and when it meets a gap, it infers and proceeds.

Researchers at Suffolk University describe this behaviour as confabulation rather than hallucination. The model fills missing context with plausible-sounding inference. The output sounds confident and is sometimes invented. Virtasant’s 2024 analysis found that only 6 per cent of enterprises fully trust AI agents on core processes, and 84 per cent of those surveyed pointed at inadequate workflow documentation as the root cause. Many owners assume the documentation gap is smaller than it actually is, because new starters have been quietly filling it for years.

What are the three format changes that make an SOP AI-readable?

An SOP that AI can read reliably is almost always one that works better for humans too. The substance of the procedure does not change. The gaps where human SOPs rely on judgement are concentrated in three places, and making those gaps explicit improves the document for both audiences. It is the same discipline taken slightly more seriously, not extra work.

The first change is explicit decision points. Conditional language like “if the customer is difficult, escalate” or “if the order is large, check stock first” prompts a human to think and leaves an AI guessing. AWS’s work on agent SOPs applies the constraint language from IETF’s RFC 2119, the MUST, SHOULD, and MAY keywords used in technical specifications, to make required and permitted actions unambiguous. “If the customer response indicates dissatisfaction or raises a concern not listed in section 4, the agent MUST escalate to the customer service manager and MUST NOT attempt resolution independently” reads as cleanly to a new starter as to a model.

The second change is named inputs and outputs at every step. “Process the refund” assumes the reader knows to find the order number, payment method, and customer email. An AI has to be told. Each step should state what information is required to begin (inputs, customer ID, order number, refund reason), what system state is assumed (the order has been marked returned in the warehouse system), and what the step produces (outputs, refund processed in Stripe, confirmation email queued, order record updated). Naming the inputs and outputs also catches steps that depend on information the previous step did not produce, which is the commonest failure mode in real procedures.

The third change is a defined terms sidebar. SOPs refer to “the standard response”, “our normal practice”, or “next working day” without saying what those mean in your business. A human in the firm knows. An AI does not, and it will reach for its training data instead, which may or may not match. The sidebar does not need to be long. It needs to say what counts as a “large order”, what the threshold is for “urgent”, and whether “next working day” excludes only weekends or also bank holidays. The UK Government Data Quality Framework treats clear definitions of terminology as foundational, not optional.

How do you tell if your SOP actually works before an AI fails on it?

The cheapest validation is the competence test. Hand the SOP to someone who knows the kind of work but not your specific processes. Ask them to follow it step by step without asking for help. Watch where they pause. The pauses reveal gaps, the questions reveal ambiguities. Some of those gaps will not bother a human reader because they will ask a colleague. The same gaps are failure points for AI.

The method aligns with how regulators think about validation. The FDA’s framework for method validation rests on demonstrating, with objective evidence, that a trained operator can follow the procedure as written and produce a consistent result. The operator can be human or machine, and the test is whether the procedure can be executed without invention. A second use of the test is catching steps that look sequential but are not. “Generate an invoice” might assume the order has been confirmed and stock allocated. If those dependencies are not stated, an AI might attempt the steps in parallel, producing an invalid invoice. Naming the input dependencies makes the constraint explicit.

Where does retrieval discipline come in?

Writing an AI-readable SOP only solves half the problem. The other half is making sure the AI can find it, identify the current version, and know whether it has been updated. The Information Commissioner’s Office’s guidance for small business names the failure mode directly, information that cannot be found is worse than information that does not exist. Three disciplines close it.

The first is a single source location: one place that holds the authoritative version, whether a Notion workspace, a Confluence space, a Google Drive folder, or a dedicated SOP tool. Other systems may cache a copy or link to it, but they are not the source of truth. IBM’s framing of a system of record versus a source of truth applies cleanly. The second is a stable naming convention. Princeton’s records management guidance recommends a pattern like function-process-version-date, for example “Sales_NewClientOnboarding_v03_20260320”, consistently applied. The third is versioned updates with a visible change log, so an AI tool that has cached an older version can check whether it has been superseded and reload if it has.

When is this worth doing and when is it not?

Not every SOP needs to be AI-readable. The upgrade carries real upfront cost. For procedures that run rarely, change often, or involve heavy judgement where AI was never going in the loop, the investment is not justified. For procedures that run frequently, follow rules, and have consequences when they go wrong, it almost always is. The honest question is whether an AI tool will be asked to read this SOP in the next twelve months.

The working rule is to start with three to five procedures that are high volume, repeatable, rule-based, and where consistency matters. New client onboarding, refund processing, meeting scheduling, invoice creation, and standard customer questions are the usual candidates. Forrester’s research on strategic AI readiness recommends the same shape, start small, scale smart, and embed governance by design. Virtasant’s analysis flags the opposite failure mode, organisations that try to automate too broadly and end up with sprawl. For an owner-operated firm the trap is the same in miniature, doing twenty SOPs badly rather than five well.

The proportionate first move is one procedure, two hours, the three format changes, and the competence test. If that pays off in faster execution or fewer corrections, take the next one. The data side of the same picture is covered in why AI projects fail at data, not at AI and the one-week data and knowledge audit; the post on capturing tacit knowledge before key people leave sits alongside this one. If you would rather work through your top three SOPs with someone who has helped other owner-managed firms do it, book a conversation.

Sources

- Virtasant (2024). Why AI Automation Strategy Fails, 84% Have Not Documented Their Workflows. Source for the finding that only 6 per cent of enterprises fully trust AI agents on core processes and that 84 per cent cite inadequate workflow documentation as the root cause. https://www.virtasant.com/ai-today/ai-automation-strategy-starts-with-your-sop - AWS Public Sector (2025). Why your AI agents give inconsistent results, and how Agent SOPs fix it. Source for the application of RFC 2119 constraint language (MUST, SHOULD, MAY) to agent SOP writing. https://aws.amazon.com/blogs/publicsector/why-your-ai-agents-give-inconsistent-results-and-how-agent-sops-fix-it/ - IETF (1997). RFC 2119, Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. Source for the canonical definitions of MUST, SHOULD, and MAY that AWS adapted for agent instructions. https://datatracker.ietf.org/doc/html/rfc2119 - Information Commissioner's Office (2024). Information Governance for Your Small Business. Source for the small-business guidance on document naming, version control, and the principle that unfindable information is worse than missing information. https://ico.org.uk/media2/migrated/4020350/information-governance-for-your-small-business-v-1-0.docx - UK Government (2023). The Government Data Quality Framework. Source for the principle that clear definitions of terminology and pre-emptive communication of process changes are foundational data quality requirements. https://www.gov.uk/government/publications/the-government-data-quality-framework/the-government-data-quality-framework - Princeton University Records Management (2023). File Naming Conventions and Version Control. Source for the naming pattern function-process-version-date and the distinction between major (v01, v02) and minor (v01_01) revision numbering. https://records.princeton.edu/records-management-manual/file-naming-conventions-version-control - IBM (2024). System of Record vs Source of Truth, What's the Difference? Source for the distinction between a system of record (authoritative source for one domain) and a source of truth (harmonised across systems) applied to SOP storage. https://www.ibm.com/think/topics/system-of-record-vs-source-of-truth - Forrester (2024). Strategic AI Readiness, How To Move From Hype To Scalable Impact. Source for the "minimum viable AI governance" approach of focused gap identification with clear ownership inside a 90-day window. https://www.forrester.com/blogs/strategic-ai-readiness-how-to-move-from-hype-to-scalable-impact/ - Suffolk University Law Review (2023). Academic analysis of large language model output as confabulation rather than hallucination, with inference-filling behaviour the explicit failure mode for context-thin instruction-following. https://dc.suffolk.edu/cgi/viewcontent.cgi?article=1386&context=suls-faculty - Systemology (2024). How to write SOPs for AI agents, the practical format upgrades that close the human-context gap. Source for the working pattern most operators land on when retrofitting human SOPs for AI ingestion. https://www.systemology.com/write-sops-ai/

Frequently asked questions

How long does it take to rewrite an existing SOP so AI can read it?

For a procedure that is already documented and runs frequently, expect two to four hours of focused work to rewrite the steps, add named inputs and outputs, define key terms, and run the competence test. A procedure that is half-documented and half lived in someone's head will take longer, often a full day, because the time goes on extracting the assumptions rather than typing them up. Start with one SOP, time it, then decide.

Do I really need RFC 2119 keywords like MUST and SHOULD, or is that overkill for a small business?

For internal staff procedures you can use plain English with the same effect, write "the agent must escalate" rather than "escalate if needed". The point is precision about what is required versus what is optional, not the specific words. AWS uses RFC 2119 because the convention removes ambiguity in agent instruction sets. A small business that distinguishes required, recommended, and permitted actions in everyday English will get the same benefit.

My SOPs live in three different places, a wiki, a shared drive, and people's heads. Where should they live for AI to use them?

One place, with everything else as a copy or a link. The system you pick matters less than the rule that one location holds the authoritative version. A Google Drive folder with consistent naming, a Notion workspace, a Confluence space, or a dedicated SOP tool all work. What does not work is two systems both claiming to be current, because then neither a new starter nor an AI tool can tell which version to follow.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation