Skip to main content

Jul 2024

In Australia: My Voice Identifies Me

Voice biometrics' promise falters as AI advancements reveal critical vulnerabilities, urging reassessment of security measures in Australia’s tech landscape.

Categories Cyber Security

Voice biometrics technology, once heralded as a revolutionary leap in security, is now facing scrutiny due to rapid advancements in artificial intelligence (AI). In Australia, voice identification systems have been widely adopted by banks, government services, and access control systems, promising high-tech security through unique voiceprints.

However, as Ryan Williams explores in this article, the technology's reliance on lower quality audio inputs and the increasing capabilities of AI-driven voice cloning have exposed critical vulnerabilities. This article delves into the workings of voice biometrics and highlights the pressing need for reassessment of its security efficacy.

How Does Voice Biometrics Technology Work? 

The process of voice biometric identification consists of two steps: 

Voiceprint extraction – a voice biometric system analyzes a voice sample and creates a mathematical model of the person’s voice (a voiceprint). If the system is analyzing the person’s voice for the first time, this phase is also called voice enrollment. 

Voiceprint comparison – the extracted voiceprint is compared with other stored voiceprints to find a match necessary for successful speaker verification or speaker identification. 

Of these two steps, voiceprint extraction is more time-consuming while voiceprint comparison is very fast – millions of voiceprint comparisons can be performed in a second. 

Voice Enrollment  

 

 A spectrogram of the acoustic wave of your voice is created, the vertical axis represents frequency, the horizontal axis represents time, and the brightness describes the amplitude of the wave. 

Based on the spectrogram analysis, a voice biometric system analyzes the characteristics and dynamics of the acoustic wave the person produces (voice) and creates a mathematical model (typically a set of floating point numbers) that represents the unique features of the person’s voice. 

Statistical and AI methods are used to find the right set of numbers to represent the shapes, sizes, and movements of the person’s vocal organs. This mathematical model of a voice is called a voiceprint. 

Voiceprint Comparison 

Once the voiceprint from the enrollment process is stored in a database, it can be instantly compared with any other voiceprint extracted from just a few seconds of speech. 

Voiceprints can be compared as: 

  • One-to-one (1:1) for speaker verification and forensic voice analysis 

  • One-to-many (1:N) for speaker identification, speaker search, and speaker spotting 

  • Many-to-many (N:M) for speaker clustering (as well as for speaker identification, speaker search, and speaker spotting) 

The result of each voiceprint comparison is presented as a score that reflects the probability that two voiceprints match (the speaker is verified) or that the voiceprint matches one of the stored ones (the speaker is identified). 

In Australia My Voice Identifies Me Part 2 

I had planned on writing a paper on the following findings.  While my discoveries were not ground breaking or even that surprising, they were proof that a security control used by our banks, government services and access control systems was now obsolete.  The first and probably not the last biometric identification system to fall victim to the exponential advances in AI technology.  The problem is, while we (the public) were sold on the “high tech security” of VoiceID, the industry already knew the system was fundamentally flawed as far back as 2017. 

Here’s the issue.  It doesn’t matter what quality your audio input is ultimately ends up at the quality of the line in to the call centre.  That ladies and gents is a VoIP connection: Codec: G.711, Sample Rate: 8 kHz (8000 Hz), Bit Depth: 8 bits per sample. For reference CD audio quality is: Sample rate: 44 kHz (44000Hz), Bit Depth: 16 bits per sample. Yes, high quality codecs like G.722 are available but not as common. 

Now from this low resolution audio, the voice ID system creates what’s called the embedding, a fixed-length vector that uniquely represents the speaker's voice.  This is then compared against the embeddings from the audio when a log in is attempted.  Future is now right? Well not really. The thing about this specific biometric is there is quiet a lot of variation in how an individual speaks.  Inflections, cadence, rhythm can vary and when combined with background noise the tolerances voice ID needs to allow for creates a near enough is good enough kind of scenario. 

All this would have been fine, security professional only needed to keep their eye on tech savvy impression artist but then came the AI explosion.  A technique similar to the one used to identify a voice was now reaching a point where it could simulate one.  Large amounts of source audio was required and the result, while impressive, wouldn’t be fooling anyone.  Fast forward to today.  30 seconds of audio, pulled from social media is enough to make a clone, high enough in quality to beat 3 popular opensource voice print ID platforms and one government department.  The cost and time put into the attack?  The service was free and in total it took about 20 mins to get a workable end product. 

Now even this wouldn’t be an issue normally except, VoiceID is used to identify millions of account holders on mygov.  More still use this feature to access certain banking features.  To understand the scope of the issue consider how many Australians would have 30 seconds of audio of their voice somewhere on social media.  The technique even worked with audio snippets from various sources edited together (see audio in the clip below).  The system isn’t secure but hardly a word has been spoken about it. 

In BioCatch’s 2024 report concerns about the AI's capacity for voice cloning have led 84% of Australian respondents (and 91% worldwide) to state that their company is currently reconsidering the use of voice verification for high-profile clients. That’s very nice for them but what about the millions of Australians who use mygov daily, or those using the numerous banking services that have adopted Voiceprint ID.   

The Australian government rolled out VoiceID in 2017 at an estimated cost of 22.5 million for technology now obsolete. Lets hope the $288 million just announced to secure the Digital ID project has a longer shelf life. 

Tony Sales and the We Fight Fraud team performed voice cloning at their conference last year and multiple podcasts have featured the technique. People have seen this coming and as such it’s the silence rather than the audio that seduced me. The siren song of so many lips sealed dashing me on the rocks of discovery and disbelief.  People need to know and things need to change. 

The once-promising future of voice biometrics is now overshadowed by the relentless march of AI technology. The Australian experience with VoiceID, as detailed by Ryan Williams, underscores the urgent need for a reevaluation of biometric security measures. The system's inherent flaws, coupled with the ease and affordability of AI-based voice cloning, pose significant risks to millions of users.

As the Australian government invests further in digital security, it is imperative that these investments are informed by the lessons learned from the shortcomings of VoiceID. The silence surrounding these issues must be broken, and proactive steps must be taken to safeguard against the evolving threats of the digital age.