Detecting AI-generated voice deepfakes proves to be a challenging task for both English and Mandarin speakers, even when they are aware that they may be listening to artificially generated speech. This presents a significant risk for the billions of people who speak these languages, as they can easily be deceived by deepfake scams and misinformation.

In a study conducted by Kimberly Mai and her team at University College London, over 500 participants were given the task of identifying speech deepfakes from a series of audio clips. Some clips featured the genuine voice of a female speaker reading standard sentences in English or Mandarin, while others were deepfakes generated by AI models trained on female voices.

The participants were divided into two groups. The first group listened to 20 voice samples in their native language and had to determine whether the clips were real or fake. The results showed that the deepfakes and authentic voices were correctly identified about 70% of the time for both English and Mandarin samples. This finding suggests that the detection of deepfakes in real-life situations would be even more challenging, as individuals would not be aware beforehand that they might be hearing AI-generated speech.

The second group was presented with 20 pairs of audio clips, each containing the same sentence spoken by a human and a deepfake. Their task was to identify the fake clip. In this scenario, the detection accuracy increased to over 85%. However, the researchers acknowledged that this setup gave the listeners an unrealistic advantage, as it is not representative of real-life situations where listeners are unaware of the authenticity of what they are hearing.

The study did not address the ability to identify whether the deepfakes sound like the target person being mimicked. Nonetheless, it is crucial to be able to discern the authentic voice of specific individuals, as scammers have exploited voice cloning to deceive employees into making financial transfers, and deepfakes of well-known politicians have been circulated on social media for misinformation campaigns.

While the research helps evaluate the advancements of AI-generated deepfakes in mimicking human voices, Hany Farid from the University of California, Berkeley, emphasizes the need for AI-powered deepfake detectors. The study also revealed that attempts to train participants to improve their deepfake detection skills were largely unsuccessful, underscoring the importance of developing automated systems to detect deepfakes.

Mai and her colleagues are now exploring the use of large language models capable of processing speech data to determine if they can effectively detect AI-generated deepfakes.