A small-firm owner I spoke with had a customer complaint land on his desk about an AI tool the firm had deployed six weeks earlier. The customer claimed the bot had given them the wrong answer and quoted a figure he could not account for. He asked the vendor for the conversation log. The vendor sent back a CSV of timestamps, model names, and token counts. The actual prompts and the actual responses had been purged thirty days ago.
He had to defend the firm’s position with no record of what the AI had actually said. That gap, between what owners think their vendor is logging and what is actually being kept, is the centre of this post. The technical name for the missing thing is an AI audit trail.
What is an AI audit trail?
An AI audit trail is the record of what your AI system was asked, what context it was given, which model processed the request, what it produced, and who reviewed the output before it reached a customer or shaped a decision. It records the semantic content of the interaction, the actual prompt and the actual response, not just the surrounding system metrics. That is the line that separates it from a normal application log.
A normal log tracks server errors, response times, and authentication events. An AI audit trail captures the conversation itself, plus the model version, the key parameters, any tool calls the system made, and the human review step. A tamper-evident audit trail goes further and uses write-once storage or cryptographic hashing, so that a record someone might have changed becomes a record that can be defended.
Why your business needs one
The audit trail solves three real problems for a UK SME. A regulator can ask for evidence that an AI-influenced decision was fair, and the FCA can issue a section 166 notice to that effect. A customer can dispute a decision your AI helped shape, and the audit trail is what lets you uphold or defend the complaint with facts. Insurers underwriting cyber and PI cover increasingly want the same evidence.
The privacy edge is the part many owners miss. If your prompts contain customer personal information, names, email addresses, account numbers, then the audit trail itself becomes a record of personal-data processing under UK GDPR. Retention turns into a DPIA question. Access turns into a security question. Keeping everything forever stops being a safe default. This is awareness, not legal advice, and the position for your specific facts belongs with your DPO and the ICO guidance on AI and data protection.
The third reason is internal. If your AI vendor mishandles your data or discontinues the service, your audit trail is the evidence of what actually moved through their systems and what the product actually did. It is also the cleanest way to spot a pattern of bad outputs before a customer points it out.
Where you will meet it in practice
Your first encounter with an audit trail is usually a vendor dashboard. OpenAI exposes an Audit Logs API at the organisation level. Microsoft 365 Copilot feeds Purview, which holds AI interaction logs across the suite. Google Workspace Vault captures Workspace AI activity. AWS Bedrock logs to CloudTrail. Azure OpenAI feeds Azure Monitor. The question worth asking each one is what level of detail is on by default.
The structural concern many owners miss sits inside those dashboards. Vendor logs are very often metadata only, who, when, what model, how many tokens. Full prompt-and-response logging is frequently a paid add-on, retained for a shorter window, commonly thirty to ninety days, and sometimes off until you turn it on. Six weeks after a customer complaint, the metadata is still there. The conversation is gone.
The other place you will meet audit trails is a regulatory request. The ICO can ask for evidence during a data protection investigation or a Data Subject Access Request. An ISO/IEC 42001 auditor will examine your record-keeping. If your firm sells into the EU, the AI Act’s Article 12 record-keeping rules for high-risk systems start to apply in stages from August 2026, with broader application in 2027. The UK has not adopted the AI Act yet, although DSIT has signalled alignment is likely.
What to log and what to avoid
The decision is what level of detail balances accountability against privacy risk and cost. A tiered approach works for a typical SME. Metadata logs, timestamp, model name, model version, user ID, token count, latency, success or error flag, are cheap to keep and rarely contain personal data, so retain them for at least six months. Full prompt-and-response logs need a clearer reason and a clearer retention rule.
Common practice in regulated sectors is six years, aligned with the typical civil-litigation window in financial services, but this is convention rather than a single legal mandate. The right number for your firm depends on the data you are logging and the decisions it evidences. Route the specific retention question through your DPO. Never log secrets, API keys, or passwords that happen to appear in a prompt, and treat any prompt containing customer personal information as a personal-data processing record.
Access is the other half. A customer-service team may need to pull a specific customer’s interaction history to answer a DSAR, but they should not have standing access to every log. Document the access policy, enforce it through role-based controls in the platform, and review who can see what every quarter. The pattern is the same one your firm already applies to financial records, applied to a new surface.
The action this quarter
The cleanest action an owner can take this quarter is a three-question sweep across every AI vendor in the firm. What do you log on my behalf, by default and as an option. How long do you keep each tier of log. Can I export the full record in a useable format if I need to switch providers or hand it to a regulator.
For firms with engineering capacity and AI making decisions that affect hundreds of customers, a dedicated observability platform sits between the application and the model and captures audit trails independently. LangSmith, Langfuse, Arize, WhyLabs, Helicone, and Weights and Biases are the names you will see on vendor architecture diagrams. The trade-off is cost and infrastructure complexity. For a typical service SME, the answer is not yet, although the question is worth revisiting every twelve months as usage grows.
Audit trails are not exciting. They become non-negotiable the first time a customer disputes a decision or a regulator asks for evidence. The point of doing the work this quarter is that the alternative, scrambling to reconstruct a conversation that has already been purged, is harder, slower, and more expensive than the small amount of discipline it takes now. If you want a place to put the audit-trail risk on your firm’s one-page AI risk register, that is where it sits, and the vendor due diligence questions it triggers are the same three above.
If you want a second pair of eyes on what your AI vendors are actually logging, book a conversation and we will walk it through together.



