Six-dimension AI engagement diagnostic audit

The owner of a thirty-person services firm has finished reading the failure-data piece and accepted that the engagement-design diagnostic from the parent post is sound but incomplete. She is staring at a blank document, asking what she actually looks at, and in what order, before she commits to round two. She does not have a procurement team or an internal audit function. What she needs is a structure she can run on a Wednesday afternoon with whoever inside the firm was closest to the first engagement.

The diagnostic that follows is what experienced operators run when the engagement-design questions do not fully explain the outcome. It is structured, finite, and inexpensive. The work is afternoon-scale, not consulting-engagement-scale. The point is to know what is actually in the way before the second engagement starts.

Which dimensions does the audit cover?

Six dimensions, every one of them load-bearing: people, process, tooling, data, vendor, and governance. Most failed first engagements concentrate in two or three of them, not all six. The audit’s job is to find which two or three. The Audity diagnostic playbook frames this inside each dimension as a three-way cross-check: what the organisation claims to do, what it actually does, and what the gap costs.

What each dimension actually covers:

People. Who was trained, who was named accountable, who left during the engagement, who was overcommitted.
Process. Which workflows the AI was supposed to change, and where the documented version diverges from the observed one.
Tooling. What was deployed, what is in use, what overlaps with other tools the team is already running.
Data. What the AI was asked to use, where it lives, whether it was reconciled across systems.
Vendor. What was promised, what was delivered, what post-sales support actually looks like by week 18.
Governance. Who owned decisions, who reviewed progress, what the steering structure looked like in practice.

How do the three phases work?

Phase one is document analysis. The original business case, the statement of work, the success metrics agreed in writing, the project plan and timeline, the post-implementation review or lessons-learned doc if one exists, the vendor’s statement of practice. This phase typically surfaces contradictions immediately.

A business case might describe automating contract review with claims of 80 percent time savings; the SOW might reference a proof-of-concept limited to ten documents over 30 days; the project plan might allocate three weeks for data preparation. The contradictions land before any conversation begins.

Phase two is stakeholder interview. Five to ten conversations, 30 to 45 minutes each, with the project sponsor, the team trained on the tool, the operational staff whose workflows were supposed to change, the IT lead, ideally the vendor’s account contact. The investigator asks for specific recollections of key moments: when the team first understood the scope, when success began to feel uncertain, what specific obstacles surfaced. These conversations typically surface a narrative significantly different from the documented record.

Phase three is synthesis. The investigator maps the contradictions between phase one and phase two to financial and operational exposure. Audity documents one audit where the formal SOP described a client intake process at approximately 48 hours, while every paralegal interviewed reported the real number at closer to two weeks. That single gap exposed a six-figure annual operational waste, invisible to dashboards and unsurfaced by vendor performance reviews.

Why specific questions matter more than thorough ones?

Generic questions produce narrative the team has rehearsed. Specific questions produce evidence. “Was the training adequate” generates a debatable answer. “What was the training plan, who was responsible for delivering it, how many staff completed it, what did staff report they learned, and what changed in observed behaviour post-training” generates an evidentiary record.

The same shift applies across the six dimensions. “Did adoption happen” is a verdict. “What does adoption mean in this context, how is it being measured, which teams had the highest adoption and which had the lowest, and what was different between them” is a useable line of investigation. The investigator is looking for testable claims, not for someone to agree with the impression they walked in with.

Valid evidence in this phase includes quantitative metrics (login frequency, feature usage, time spent in the tool, ROI against the original case), qualitative evidence (staff interviews, change management records), and comparative evidence (claimed processes versus observed practice). When a vendor or sponsor offers narrative, the investigator asks to see the underlying numbers. The audit is a pattern-recognition exercise grounded in observable fact.

Who should run the diagnostic?

Not the team who ran the first engagement. They are too invested in the existing narrative of why it stalled. The IT team that integrated the tool may be invested in framing the failure as a business problem. The project sponsor may be invested in citing external factors like staff turnover or budget cuts rather than internal change-management gaps.

The work goes faster and reaches further when conducted by someone with no stake in the outcome. That can be an external adviser brought in specifically for the diagnostic, or an internal leader from a different part of the organisation who has bandwidth and credibility to ask hard questions. Either works. What matters is the absence of investment in the existing explanation.

This shift is also easier than it sounds. A neutral party brings two things at once: permission to ask awkward questions, and pattern recognition from having seen similar failures before. The team being interviewed often welcomes this. They have been holding the contradictions privately. Naming them out loud is often the first relief in months.

Which named frameworks contribute scaffolding?

Several mature frameworks exist for post-implementation review and AI maturity assessment. None are off-the-shelf for SMEs. Each contributes a useful checklist when borrowed selectively. PMI’s Post-Implementation Review process examines whether functional and operational tests were conducted, whether user acceptance testing occurred, whether the original objectives were met. ITIL’s service review process interrogates whether agreed service levels were delivered and whether change management around the deployment was adequate.

Gartner’s AI Maturity Model sorts organisations into five levels: Awareness, Repeatable, Defined, Managed, Optimized. Most failed-first SMEs sit at Awareness or Repeatable. A second engagement that assumes a higher maturity level than actually exists is predestined to fail in the same way as the first. Cognizant’s AI/ML maturity model focuses on ten pillars, including data strategy, talent cultivation, and business integration, and sets up the audit as iterative rather than one-shot.

The point of borrowing from these is the discipline they enforce, not their internal vocabularies. A small audit is not a Gartner consulting engagement. It is half a day of structured questions informed by what experienced operators have learned to look for.

What should the output actually look like?

A prioritised implementation roadmap, on one page if it can be. Three to five issues named with specific evidence behind each. For each issue, the financial or operational cost in real numbers, the dependency on any second-engagement design, and the work needed to address it. The order matters. The audit names what to fix first, why, and what happens if the second engagement starts before that issue is addressed.

This is what separates a useful audit from a shelf report. A shelf report describes the situation. A useful audit produces decisions. Three priorities. One page. Owned by named people with named timelines. The second engagement gets scoped against the audit’s priority list, not against a generic vendor SOW.

The next post in this cluster covers the consolidation work that often shows up as the top priority when the audit lands honestly. The parent piece on the second-time buyer’s situation is the on-ramp if you have not yet done the four-question engagement-design diagnostic.

If you would like a neutral party to run this audit on a stalled engagement, book a conversation.

The six-dimension diagnostic audit you should run before a second AI engagement

Key takeaways

Which dimensions does the audit cover?

How do the three phases work?

Why specific questions matter more than thorough ones?

Who should run the diagnostic?

Which named frameworks contribute scaffolding?

What should the output actually look like?

Sources

Frequently asked questions

What does a diagnostic audit before a second AI engagement actually look at?

Who should run the diagnostic?

How long does a proper diagnostic audit take?

What is the difference between this audit and the four-question diagnostic in the parent post?

Ready to talk it through?

If any of this sounds familiar, let's talk.

The six-dimension diagnostic audit you should run before a second AI engagement

Key takeaways

Which dimensions does the audit cover?

How do the three phases work?

Why specific questions matter more than thorough ones?

Who should run the diagnostic?

Which named frameworks contribute scaffolding?

What should the output actually look like?

Sources

Frequently asked questions

What does a diagnostic audit before a second AI engagement actually look at?

Who should run the diagnostic?

How long does a proper diagnostic audit take?

What is the difference between this audit and the four-question diagnostic in the parent post?

Ready to talk it through?

Related reading

AI in B2B SaaS and tech firms in 2026

AI in UK hospitality 2026: where the margin actually moves

AI in UK manufacturing in 2026: five use cases, six constraints, and Made Smarter as the route in

If any of this sounds familiar, let's talk.