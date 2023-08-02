Google’s research team is currently conducting an experiment involving OpenAI’s GPT-4 to test the security measures of other AI models. AI-Guardian, an AI review system designed to detect inappropriate content in images and identify modifications made by other AI, was chosen as a target for this experiment.

The research team successfully managed to bypass AI-Guardian’s audit system, exposing vulnerabilities in its defense mechanism. They have shared the technical details of their findings in a research paper titled “LLM-Assisted Development of AI-Guardian,” authored by Nicholas Carlini, a researcher at Google DeepMind.

Using GPT-4, the team deployed misleading scripts and explanations to deceive AI-Guardian into misidentifying images. For instance, GPT-4 created scenarios where an image of someone holding a gun would be perceived by AI-Guardian as an innocent image of someone holding an apple. As a result, AI-Guardian released the image without flagging it. Through the utilization of GPT-4, the research team was able to significantly reduce AI-Guardian’s accuracy from 98 percent to a mere 8 percent.

The technical details of this experiment have been published in ArXiv, providing a resource for those interested in further exploration. Although the developers of AI-Guardian have acknowledged this specific attack method, they assure that future versions of the system will address these vulnerabilities. Moreover, they anticipate that other AI models will adapt and enhance their defenses accordingly, rendering the current attack scheme obsolete in the future.