A 10-person accountancy practice piloted ChatGPT-drafted client replies before they had set up any email triage. Two weeks in, a partner caught a draft that misstated a tax-treatment assumption to a client. The pilot stopped that afternoon. Six months later, the same practice deployed Crisp for email triage only. They freed five hours a day across the team and the partner could not point to a single client risk introduced. The tool was different, but more importantly, the deployment order was different. They started at the layer with no client-facing exposure.
This is the inbox AI mistake most SMEs make in their first attempt. The Klarna story makes autonomous response generation feel inevitable. The starting move is the layer Klarna is not advertising: classification and routing, the part that has nothing to do with the client and everything to do with how email moves through the firm.
What are the three layers of inbox AI?
Inbox AI works in three distinct layers, each with different risk and different value. Triage classifies and routes incoming emails. Briefing summarises long threads. Drafting generates first-pass replies. The risk concentrates in the draft layer, where AI directly touches client communication. The value distributes across all three. Most owners deploy them in reverse order.
The triage layer is mechanical. The AI is reading the email and deciding which category it belongs to (billing, status, document request, complaint, scheduling, escalation). It does not write anything. It does not reach the client. It moves the message to the right team member or the right priority lane. Classification accuracy on well-trained systems lands at 90 to 95 percent. Errors are caught when the wrong person opens the email.
The briefing layer is also low risk. AI summarises long email threads or attached documents so a team member can read a paragraph instead of fifteen messages. It does not produce client-facing output. The drafting layer is where the risk lives, because the AI is now writing on behalf of the firm.
Why is triage the right place to start?
Triage delivers immediate value at low risk. For a 10-person practice receiving 20 to 30 support emails a day, eliminating the manual sorting step (which typically eats 30 to 60 minutes per day per team member) yields 4 to 8 hours a week of recovered team time. None of that touches the client. None of it produces a document the client will read. The win is purely operational.
IBM research shows AI can reduce average response times by up to 99 percent in scenarios where customers were waiting hours for a reply, by routing emails to the correct team member immediately on receipt. The reduction comes from the email reaching the right person without any waiting in a shared inbox, with no AI-generated reply involved.
The protocol that works: pilot the triage system on one team member for one week, with the AI classifying and routing while the team member verifies and corrects misclassifications. Track classification accuracy. Expect 85 to 90 percent at week one, rising to 95-plus percent at week two through correction loops. When the correction effort drops below five minutes a day, scale to the team.
What does the briefing layer add?
Briefing turns a 15-message thread into a paragraph. For a team handling 50-plus emails a day, the time saved on reading and contextualising falls from 2 to 5 minutes per email to 30 to 60 seconds. That is 1.5 to 4 hours a day of reading time recovered across the team. The briefings are reviewed before any reply is sent, so client-facing risk stays at zero.
Briefings work best on long threads, attached documents, and inquiries with multiple back-and-forth exchanges. They do not replace the team member's judgement. They give the team member a faster way to absorb context before they apply judgement. Teams using briefing tools report measurably reduced mental fatigue in high-volume email environments.
Add briefing once triage has been stable for three to four weeks. The team is already used to the AI moving emails. Adding a layer that summarises content the team will read is a small step from there.
When does draft generation become safe?
Draft generation becomes safe when the firm has documented review protocols, established categories of inquiry where AI drafts work well, and accepted that complex inquiries will not be drafted by AI in the first wave. Routine inquiries (billing questions, appointment status, document requests, simple updates) see 40 to 60 percent time savings on drafting. AI takes 20 to 30 seconds; manual drafting takes 2 to 3 minutes.
Complex or nuanced inquiries are different. AI drafts on these are often unusable without significant editing, and the editing time can match or exceed manual drafting time. The discipline is to limit AI drafting to a defined set of categories and route everything else to a human. The triage layer makes this routing reliable.
End-to-end response time on routine inquiries can drop from 4 to 6 hours to 15 to 30 minutes. Off-hours ticket abandonment drops by over 50 percent. The client experience improves because clients get faster, more consistent responses, not because the firm has eliminated humans from the process.
What does Klarna actually tell SMEs?
Klarna's AI assistant handles two-thirds of customer service chats. In the first month, that was 2.3 million conversations, on par with human satisfaction, with a 25 percent drop in repeat inquiries. Estimated profit improvement of £40m in 2024. The numbers are real and the success is documented.
The relevant lesson for an SME is not the percentage. Klarna handles a high volume of routine, transactional, well-categorised inquiries (refund status, payment plans, account questions). A professional services firm receives a different mix: questions specific to a client's matter, requests for advice, sensitive negotiations, status updates on bespoke engagements. The autonomous-handling rate for that mix is 30 to 40 percent, not two-thirds.
The principle that translates is the staged deployment, not the percentage. Klarna built the autonomous chatbot on top of years of email and chat triage. SMEs should expect the same sequence: classification first, briefing next, drafting on routine inquiries last, autonomous response only for the small subset where context-specific risk is low.
What compliance gates does the regulated sector hit?
For legal practices, email correspondence is potentially privileged. AI processing of client emails creates a confidentiality risk if the AI platform retains, processes, or shares the communication outside the firm-client relationship. The SRA requires confidentiality. Most professional-grade tools (Crisp, Zendesk) have Data Processing Agreements and claim GDPR compliance. Consumer-grade tools (free ChatGPT, free Copilot) do not, and are not suitable for processing client personal data.
For accountancy firms, ICO and UK GDPR rules govern processing of personal data in client emails. Legitimate basis for processing is required, and a DPA with the AI vendor is the working standard. For healthcare clinics, NHS Digital governance plus GDPR plus HIPAA-equivalent UK protections apply. For financial services firms, FCA evidence-of-communication requirements mean records of who reviewed an AI-generated response and when must be maintained.
The practical gate is consistent: AI does not autonomously respond to client emails in regulated sectors without human review and a documented sign-off process. Triage, briefing, and draft generation are all acceptable within proper governance. Autonomous response is the last and highest-risk move, not the first.
If you are working out which inbox layer to deploy first and how to keep the regulator and client side both clean, the deployment order is the part most vendors will not write down for you. Book a conversation.



