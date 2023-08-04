New research conducted by Stanford University and the University of California, Berkeley has uncovered a significant challenge in the development of artificial intelligence (AI). The study focused on ChatGPT, an AI tool developed by OpenAI, and found that it struggles with certain basic math operations.

The research team aimed to evaluate the performance of ChatGPT over time and across various tasks. They tested two versions of the tool: version 3.5, which is available for free, and version 4.0, accessible through a subscription. The results of the study were less promising than anticipated.

When given the task of identifying prime numbers, ChatGPT 4.0 correctly classified 84% of the numbers in March. However, by June, its success rate had dropped to 51%. Additionally, GPT-4 showed deteriorating performance in six out of eight different tasks. While GPT-3.5 improved in six measures, it still remained inferior to its advanced counterpart in most tasks.

Users who initially found ChatGPT impressive have started to notice more incorrect answers and instances where the chatbot fails to respond. The research conducted by the Stanford-Berkeley team empirically confirms these observations and demonstrates that the chatbot’s performance has worsened in certain functions, including math calculations, medical queries, and code generation.

OpenAI acknowledges the research findings and has stated their commitment to improving newer model versions across a comprehensive range of tasks. They also recognize the need for further enhancements in their evaluation methodology.

The study sheds light on the challenges faced in designing AI models that consistently improve across all aspects. It highlights the phenomenon of drift, where attempts to enhance one aspect of AI models may lead to a decline in performance in other areas. This research underscores the need for ongoing efforts to address these challenges and ensure the development of reliable and accurate AI tools.