Distrito Telefónica. Innovation & Talent Hub

Protecting voice communications privacy and security from synthetic voice threats

Technology
Cybersecurity and Privacy Artificial Intelligence
Artificial intelligence has reached a point where it can mimic human voices with astonishing accuracy. This has opened the door to new forms of fraud, impersonation and manipulation in digital environments. It is no longer just about protecting data: it is now necessary to protect the voice as well. 

Against this backdrop, the Digital Life Disruption Lab (DL2) team at Discovery within Telefónica Innovación Digital has patented a system that proposes an innovative technological solution that acts as an intelligent shield for audio and video communications.

Its proposal is simple but powerful : to detect synthetic voices in real time and protect the user's identity by transforming their voice before it leaves the device. 

The system operates completely locally, without sending data to the cloud or relying on external servers.

This guarantees total privacy and allows it to work even on conventional devices such as mobile phones or tablets. The key is its efficiency: it uses lightweight neural networks and continuous learning techniques that allow it to adapt to new threats without compromising performance. 

When a call comes in, the system analyses the incoming audio in real time. If it detects suspicious patterns that suggest that the voice has been artificially generated, it alerts the user.

This detection is based on the analysis of complex audio characteristics, such as phase and spectrogram magnitude, which are difficult for even the most advanced speech generators to fake. 

But the protection does not stop there. The system can also anonymise the user's voice before it is transmitted. This means that even if someone records the call, they cannot use that voice to create a digital replica. The transformation maintains the content and clarity of the message, but removes any identifiable features such as pitch, timbre or natural speed of speech. 


Real-time detection of synthetic voice

Real-time detection of synthetic voice

In addition, the system learns with use. If the user detects a suspicious voice that was not automatically identified, it can report it. This information is used to improve the model through federated learning, a technique that allows algorithms to be trained without sharing personal data. Thus, the system becomes more accurate over time, without compromising privacy.  

This technology has immediate applications in sectors such as telecommunications, banking, customer care, healthcare and any environment where voice authenticity is critical. It can also be integrated into video calling platforms, virtual assistants or IoT devices.  

In a world where trust in what we hear is at stake, this solution represents a firm step towards more secure, private and resilient communications. It not only responds to today's threats, but anticipates those to come. 

Explore our next research

ATTPwn: emulation of opponents

ATTPwn is a platform that simulates cyber threats to assess and improve the security of computer systems and networks.

10/20/2023
Server room technician removing rack from cabinet panel