ChatGPT, a widely used generative AI model, is experiencing a perplexing phenomenon – it is getting progressively dumber. This goes against the expectation that AI models would become more intelligent over time as they continuously train themselves with user input. The explanation for this lies in the concept of “drift.”

Drift refers to when large language models (LLMs) behave in unexpected and unpredictable ways that deviate from their original parameters. Researchers from the University of California at Berkeley and Stanford University conducted a study to investigate drifts and analyze the changes in two popular LLMs: GPT 3.5 (behind ChatGPT) and GPT-4 (behind Bing Chat and ChatGPT Plus).

The study compared the performance of both LLMs in various tasks such as solving math problems, answering sensitive questions, opinion surveys, multi-hop knowledge-intensive questions, code generation, US Medical License exams, and visual reasoning tasks. The comparison was done between the March and June versions of the models.

The results revealed that GPT-4’s March version outperformed the June version in several instances. The most notable difference was in basic math prompts, where the March version performed better in both examples. GPT-4 also showed a decline in code generation, answering medical exam questions, and opinion surveys. All of these changes can be attributed to the drift phenomenon.

One of the researchers, James Zou, expressed surprise at how quickly the drift was occurring. Despite the diminishing intelligence, there were also some instances of improvement in both GPT-4 and GPT-3.5. The researchers advise users to continue using LLMs but caution them to constantly evaluate their performance and exercise care while relying on them.

This study sheds light on the evolving nature of language models and highlights the importance of ongoing monitoring and assessment to ensure their reliability and accuracy.