Google’s research team is using OpenAI’s GPT-4 in an experiment to test the security protection measures of other AI models. In their latest development, they have successfully breached the defense mechanisms of the AI-Guardian audit system and shared the technical details.

AI-Guardian is an AI review system designed to detect inappropriate content in images and identify if they have been modified by other AI models. When signs of modification are detected, the system alerts the administrator for further action.

Google DeepMind researcher Nicholas Carlini details the use of GPT-4 in deceiving AI-Guardian’s defense mechanism in a paper titled “LLM-Assisted Development of AI-Guardian.” By leveraging GPT-4, the research team was able to trick AI-Guardian into misinterpreting a photo of someone holding a gun as a harmless apple. This allowed them to bypass the security measures, resulting in a drastic reduction in the model’s accuracy from 98 percent to just 8 percent.

The technical documents explaining the attack method have been published in ArXiv. However, AI-Guardian’s developers have acknowledged that this specific attack scheme will not be effective in future versions of the system. They anticipate that other models will be developed to address this vulnerability, thus rendering Google’s current attack scheme obsolete.

Additionally, it is worth mentioning that the experiment conducted by Google’s research team raises concerns about the security of AI systems against possible malicious attacks. As AI technology continues to advance, it becomes crucial to develop robust security measures to protect against potential vulnerabilities and breaches in AI models.