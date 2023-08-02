Large language models (LLMs) have revolutionized natural language processing by simulating human-like text generation and comprehension. However, there is a growing worry over the potential misuse of these models to generate objectionable content. Researchers from Carnegie Mellon University’s School of Computer Science, the CyLab Security and Privacy Institute, and the Center for AI Safety in San Francisco conducted a study to investigate this issue.

The researchers devised a novel attack method that involved appending a suffix to a wide array of queries. This technique significantly increased the likelihood that both open-source and closed-source language models would produce undesirable responses that they would typically reject. The attack suffix was successfully applied to various language models, including well-known interfaces like ChatGPT, Bard, and Claude, as well as open-source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. Consequently, objectionable content was induced in the outputs of these models.

The study demonstrated the broad applicability of this attack approach, revealing that it could impact language models with public interfaces and open-source implementations alike. As autonomous systems become more prevalent, countering such adversarial attacks becomes imperative.

It is essential to note that the researchers did not initially aim to attack proprietary large language models and chatbots. However, their findings showed that even closed-source models are susceptible to attacks by analyzing freely available, smaller, and simpler open-source models.

Moving forward, efforts are underway to develop countermeasures to address these vulnerabilities and promote the secure and reliable use of language models in autonomous systems. Safeguarding against objectionable content generation is crucial to maintain the ethical use of large language models in various applications.