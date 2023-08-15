OpenAI claims that it has found a way to utilize its flagship AI model, GPT-4, for content moderation, thereby reducing the workload on human moderation teams. The technique involves providing GPT-4 with a policy that guides its moderation decisions and creating a test set of content examples that may or may not violate the policy. These examples are labeled by policy experts and then fed into GPT-4 to observe how well its labels align with the experts’ determinations. The policy is refined based on these results.

OpenAI asserts that this process, which some of its customers are already employing, allows for the rapid implementation of new content moderation policies within hours. OpenAI suggests that this approach is superior to the methods used by other startups, such as Anthropic, as it avoids relying solely on the models’ “internalized judgments” and instead focuses on iterative improvements.

However, it is important to approach AI-powered moderation tools with caution. Google’s Perspective, for example, has been available for several years but has not been flawless. Previous studies have highlighted issues with these tools, such as the detection of negative sentiments or toxic content when discussing people with disabilities. Some versions of Perspective also struggled to identify hate speech that utilized certain slurs or alternative spellings. These shortcomings stem from inherent biases introduced by human annotators during the training process.

OpenAI acknowledges that GPT-4 is still susceptible to biased judgments due to training data, and therefore, careful monitoring, validation, and human involvement are essential to ensure the quality and fairness of its output. While GPT-4 may offer improved performance, it is vital to remember that even the best AI models are prone to errors. This is especially crucial in the context of content moderation, where accuracy and fairness are paramount.