The owner of a 14-person professional services firm is forty-eight hours from signing a forty-thousand-pound twelve-month AI engagement. The vendor has given her three reference contacts. She has booked the calls. She has no script beyond “are you happy with them”, and a slightly sinking feeling that she is about to spend ninety minutes on the phone confirming what the vendor already told her.
That feeling is correct. Reference checks done the default way produce almost no information. The vendor selects the references. The references like the vendor or they would not still be customers. They have invested time, money and reputation in the decision, and they are not about to sit on a call with a stranger and concede they were misled. Asking better questions of biased sources does not solve the problem, because the sources are still biased. The fix is a different set of questions paired with a different set of sources.
What is a reference check that actually works?
A reference check that actually works is one buyer talking peer-to-peer to another buyer about a decision the second buyer has already lived through. It uses a small set of specific questions designed to surface gaps between the vendor’s pitch and the customer’s actual experience, and it triangulates the vendor-supplied references against independent contacts found through LinkedIn, community channels, review aggregators, and the vendor’s own published case studies.
The frame matters because it removes the social friction that makes the default conversation produce bland answers. You are not asking a stranger to criticise a supplier. You are a buyer learning from another buyer about scoping, costs, surprises and problem resolution. The vendor’s three-name list becomes one input in a wider picture, not the entire picture.
Why does it matter for your business?
It matters because the default approach has a fixed cost and a hidden one. The fixed cost is the time spent on calls that confirm the pitch. The hidden cost is the false sense of due diligence that licenses you to sign. Forrester’s work on B2B vendor evaluation and Gartner’s Voice of the Customer methodology both find that generic satisfaction questions correlate poorly with deployment success, renewal and actual customer experience.
The hidden costs that real reference work surfaces are also commercially material. BCG’s 2024 AI procurement survey and Deloitte’s State of Generative AI in the Enterprise both report common patterns of timeline slippage, underestimated data preparation, higher than expected staff retraining and integration complexity that did not appear in vendor proposals. The MIT Sloan piece on evaluating AI vendors makes the same point more pointedly. A buyer who has not surfaced these patterns before signing has agreed to absorb them.
What five questions produce real information?
Five questions, in this order, produce more information in three calls than ten calls of generic satisfaction probing. They work because they ask for descriptive experience rather than verdicts, and because their phrasing makes it socially comfortable to answer honestly rather than diplomatically.
The first question is what surprised you that the vendor did not warn you about. The phrasing is generous to the vendor and forensic about gaps in communication. The respondent can describe a longer implementation, a deeper data cleansing burden, or a heavier internal change-management load without feeling they are attacking the vendor.
The second is what would you change about your scoping if you started again. This shifts the locus of evaluation from “did the vendor mislead us” to “what would we do differently as buyers”, which is far less threatening and produces more honest reflection. Common answers cluster around tighter requirements before vendor engagement, harder negotiation on implementation support, and earlier involvement of end-users.
The third is where did the actual costs land versus the proposal. Be specific about categories. Software licence, implementation services, customisation, integration, training, ongoing support, internal staff time. Vendr’s practitioner research on SaaS total cost of ownership and the BCG AI procurement work both quantify the typical gap between proposed and actual.
The fourth is how have they handled it when something went wrong. Ask for a specific incident. Listen for how the vendor took ownership, the tenor of the communication, and whether the resolution fixed the root cause or only the symptom. The fifth is whether they would renew or switch when their contract ends, and why. Hedged renewals (“we probably will, but we’re looking at alternatives”) are weaker than confident ones and are themselves diagnostic.
How do you verify beyond the supplied references?
You verify by going to four independent channels, each with a different selection bias from the vendor’s. LinkedIn lets you search for the vendor name in employees’ work histories and reach out to customers the vendor did not put forward. A short, transparent message that names what you are evaluating and asks one specific question gets a surprisingly high response rate, especially from people with strong views.
Professional community channels carry candour that no reference call produces. Industry Slack groups, sector subreddits like r/sysadmin or r/ITManagers, trade forums and Discord servers all contain practitioner-to-practitioner discussion of specific vendors with no concern about being relayed back. Filter what you find by company size and industry similar to yours.
Software review aggregators like G2, TrustRadius and Capterra carry hundreds of reviews where the vendor’s list carries three. The rating distribution itself is informative. Polarised ratings suggest a vendor that works well in some contexts and fails in others. Uniformly five-star ratings suggest a curated review population rather than a genuine signal. Read the negative reviews carefully, filter by company size and industry, and look for repeated patterns rather than isolated complaints.
Cold outreach to customers named in the vendor’s published case studies is the fourth channel and the most under-used. Case studies are marketing, so the vendor’s framing is positive by design. The named customer is real, contactable on LinkedIn, and often willing to give twenty minutes to a peer making the same decision. The divergence between the case study’s framing and the customer’s actual account is often where the most useful information sits.
What do you do with the picture you assemble?
You convert it into patterns rather than verdicts. A single source mentioning a difficulty is anecdote. Five sources across independent channels mentioning the same surprise is a pattern that needs investigation before you sign. Map the answers across five buckets: timeline versus proposal, actual versus proposed cost, staff retraining burden, vendor responsiveness to problems, renewal intent. Where do sources agree, where do they diverge, what appears in independent sources but never in the vendor’s pitch.
Generalities themselves are data. If reference answers are uniformly bland, that thinness is its own signal. A vendor whose customers cannot offer a specific story when asked for one usually has a vendor relationship that has produced no specific stories worth telling. Absence is also data. If you ask six contacts about data quality and none of them can speak to it, ask their IT teams instead, because data quality issues are often handled invisibly by technical staff and never surface to business users until they become operational problems.
The output of all of this is not “this vendor is good” or “this vendor is bad”. The output is a clearer picture of what working with this vendor will actually look like in a firm the size and shape of yours, where the gaps in their pitch sit, and which contract clauses, scoping conversations or implementation commitments deserve harder negotiation before you sign. Three reference calls done this way are worth more than ten done by default. If three vendor proposals are open on your desk this week and the default reference process is the only diligence you have planned, book a conversation.



