How to spot synthetic voices in calls and voicemails

Your accounts manager picks up the phone just before lunch. The caller sounds exactly like your operations director, familiar cadence, the right accent, a slightly clipped delivery that is recognisably his. He needs a contractor payment processed before end of day. The account number is different from the usual one, but there is a clear explanation. She is busy, the request sounds certain, and it takes less than four minutes to action.

Two hours later she mentions it in passing. Your operations director was at a client site all morning. He made no calls.

This is the scenario that the National Cyber Security Centre and Trading Standards teams are documenting with increasing frequency. The technology behind these calls has become cheap, accessible and startlingly convincing. The tells are subtle, but they exist.

What is a synthetic voice?

A synthetic voice is audio generated by a computer model trained on a real person’s speech. Modern systems analyse recordings for pitch, rhythm, tone and pronunciation, then produce new sentences in that voice saying anything the attacker chooses. The source material can be as brief as a three-second clip from a social media video, a podcast appearance, or a voicemail greeting.

The resulting voice profile, sometimes called a speaker embedding, lets attackers generate unlimited new audio from that vocal identity. Research suggests these clones can fool listeners in roughly half of attempts when built from as little as three seconds of audio. Attackers then typically pair the cloned audio with caller-ID spoofing, which displays whatever phone number they choose on the recipient’s screen, so the call appears to come from a trusted source.

Some systems convert speech in near real time, so the attacker speaks and the target hears the impersonated voice instead. Others play pre-scripted recordings. Either way, what the person receiving the call hears is audio that passes a gut-feel check, and increasingly a careful-listening one too.

Why does this matter for your business?

Voice scams targeting businesses grew sharply in 2024. For an owner-managed firm, the exposure sits in two areas. Your finance function is the first, where a convincing instruction to move money or change supplier details can be acted on before anyone pauses to question it. The second is your staff and client data, where a successful scam can trigger a UK GDPR breach.

Small finance teams are a particular target because one or two people typically hold payment authority with limited formal sign-off structure. The UK government estimated around eight million synthetic media clips were shared in 2024, up from roughly 500,000 in 2023, and voice cloning is a growing part of that picture. Staff trust direct calls from leadership because that is how things have always worked. A cloned voice requesting a transfer to an unfamiliar account can feel entirely consistent with how business normally gets done.

The regulatory exposure compounds the financial risk. If a synthetic voice call persuades a member of staff to share client data or export a payroll file, that can constitute a personal data breach under the UK GDPR. The ICO’s breach notification requirement runs to 72 hours from discovery. The FCA also expects regulated firms to have specific controls against vishing, and those expectations are a practical benchmark for any business regardless of whether it holds FCA authorisation.

Where will you actually encounter a synthetic voice?

The three scenarios your staff are most likely to encounter are a call impersonating your director requesting an urgent payment, a voicemail asking for a call-back to a different number, and a supplier ringing to notify you of changed bank details. In each case the voice sounds authentic, the caller ID may show a familiar number, and time pressure is built into the request.

Researchers have documented specific audio patterns in synthetic voice calls, though none are reliable as a sole indicator. Some clones show unnatural rhythm, absent background noise, or a slight delay of two to three seconds when the system processes your voice. Suspiciously perfect speech, with no verbal stumbles or variation in tone, can itself be a signal when the caller claims to be rushed or stressed.

Conversational patterns tend to be more revealing. A cloned voice cannot reproduce memories, private knowledge, or the conversational habits of the real person. Ask a specific question about something only the real caller would know, an internal nickname, a meeting you both attended, a decision taken in the last few days. An AI system may hesitate, deflect, or give a vague answer. A caller who resists normal conversation and keeps returning to the original request is showing a behavioural pattern worth noting.

Friends Against Scams and the NCSC both emphasise that the content and context of the request matter more than how authentic the voice sounds. Urgency combined with a request outside normal process is the clearest signal of all.

When should your staff verify a call?

Any verbal request that asks your staff to bypass normal approval, move money to a new or unverified account, share access credentials, or keep the request confidential should trigger out-of-band verification every time. The goal is a short list of high-risk actions where a voice call alone, however convincing, is not enough authority to proceed. Outside that list, normal discretion applies.

For a typical owner-managed services firm, the trigger list covers fund transfers to a new or changed account, bypassing normal purchase-order or approval processes, requests for MFA codes, passwords, or remote-access credentials, and any instruction accompanied by a demand for secrecy. Write these into your payment and data-handling policies so that acting on a verbal-only request for any of them is clearly outside the firm’s procedures.

When a trigger fires, end the call politely and dial back using a number already held in your contact records rather than using redial, which connects to the number the caller supplied. For voicemails, the same applies. Use a number you already had before the message arrived, not the one left in it.

A shared codeword agreed in person between senior staff and the finance team adds a low-cost verification layer for urgent instructions that cannot wait for a full callback.

What else should you know about voice fraud?

Voice fraud sits alongside several related threats your firm should understand. Caller-ID spoofing lets attackers display any number they choose, including your bank or director’s mobile. Vishing (voice phishing) uses human callers rather than AI voices but follows the same social-engineering playbook. Business email compromise, where a fake email replaces the phone call, is often used alongside voice attacks to make the scam feel more credible.

Attackers frequently run these vectors together. A fake call is followed by a confirmatory text message or email from a spoofed address, making the interaction feel consistent across multiple channels. The NCSC describes this as a multi-channel social-engineering pattern and recommends that any confirmation should come through a single trusted, independently verified channel rather than any channel the attacker may already have accessed.

One practical step on voice exposure. Trading Standards evidence shows that scammers build usable voice clones from social media clips, recorded webinars, conference videos, and personalised voicemail greetings. Standardising voicemail greetings for finance and operations roles, using a generic team message rather than an individually recorded one, reduces the pool of high-quality audio available to an attacker.

If your firm records calls, the ICO notes that voice recordings containing personal data carry their own data-protection obligations, including retention limits and security requirements. Voiceprints used to identify a person are classified as biometric data under the UK GDPR and require additional safeguards.

For the process controls that complement these detection signals, including callback rules, safeword protocols and payment segregation, see our post on practical controls to reduce voice-cloning fraud risk.

How to spot synthetic voices in calls and voicemails

Key takeaways

What is a synthetic voice?

Why does this matter for your business?

Where will you actually encounter a synthetic voice?

When should your staff verify a call?

What else should you know about voice fraud?

Sources

Frequently asked questions

Can you tell a synthetic voice just by listening to it?

What should I do if I think a synthetic voice call was made to my business?

Are there technical tools that can detect synthetic voices automatically?

Ready to talk it through?

If any of this sounds familiar, let's talk.

How to spot synthetic voices in calls and voicemails

Key takeaways

What is a synthetic voice?

Why does this matter for your business?

Where will you actually encounter a synthetic voice?

When should your staff verify a call?

What else should you know about voice fraud?

Sources

Frequently asked questions

Can you tell a synthetic voice just by listening to it?

What should I do if I think a synthetic voice call was made to my business?

Are there technical tools that can detect synthetic voices automatically?

Ready to talk it through?

Related reading

How much AI does a founder actually need to understand?

Why data provenance matters for AI training sets and trust

What people mean by AI origin and source tracking

If any of this sounds familiar, let's talk.