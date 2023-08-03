Even when people are aware that they might be listening to AI-generated speech, it remains difficult for both English and Mandarin speakers to accurately detect deepfake voices. This poses a potential risk for the billions of individuals who understand these widely spoken languages, as they may unknowingly fall victim to deepfake scams or misinformation.

A study conducted by Kimberly Mai at University College London and her colleagues involved over 500 participants who were challenged to identify speech deepfakes from a series of audio clips. Some clips contained the authentic voice of a female speaker reading generic sentences in English or Mandarin, while others were deepfakes generated by AI models trained on female voices.

In one experimental setup, a group of participants listened to 20 voice samples in their native language and had to determine whether the clips were real or fake. Results showed that people correctly classified the deepfakes and authentic voices approximately 70% of the time for both English and Mandarin samples. However, it is important to note that this detection rate may be even lower in real-life scenarios, as individuals may not be aware that they are listening to AI-generated speech.

Another group of participants was presented with randomly paired audio clips, with each pair featuring the same sentence spoken by a human and a deepfake. They were then asked to identify the fake. In this scenario, the detection accuracy increased to over 85%. Nevertheless, the researchers acknowledged that this setup did not fully reflect real-life situations, as listeners would not have prior knowledge about the authenticity of the voice or be able to detect subtle speech differences.

The study did not specifically address whether participants could identify whether the deepfakes sounded like the targeted individuals being mimicked. Hany Farid at the University of California, Berkeley, emphasizes the importance of being able to identify the authentic voice of specific speakers in real-life scenarios. Scammers have been known to clone the voices of business leaders to deceive employees into making fraudulent money transfers, while misinformation campaigns have spread deepfakes of well-known politicians on social media platforms.

While this research helps evaluate the progress of AI-generated deepfakes in imitating human voices, it does not provide a comprehensive solution. Attempts to train participants to improve their deepfake detection skills were largely ineffective. Therefore, the development of AI-powered deepfake detectors becomes crucial. Mai and her colleagues are exploring the use of large language models capable of processing speech data to tackle this challenge.