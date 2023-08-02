A recent study conducted by psychologists at the University of California-Los Angeles has highlighted the exceptional reasoning capabilities of the language model GPT-3. Published in the journal Nature Human Behavior, the study compared the performance of GPT-3 with that of 40 UCLA undergraduates in answering questions resembling those found in standardized exams like the SAT.

The research evaluated the participants’ problem-solving skills, specifically their ability to use previous knowledge to tackle new and unfamiliar situations. The study included questions that required selecting pairs of words with similar relationships and analogical reasoning based on information derived from a short story passage. In both scenarios, GPT-3 outperformed the college students, even surpassing the average SAT score of college applicants.

GPT-3 also demonstrated proficiency in logical reasoning, comparable to the human subjects, as examined through Raven’s Progressive Matrices. This success in standardized exams is not new for language models like GPT-3, as previous studies have already established their aptitude in logical reasoning by having them take exams such as AP tests, LSATs, and MCATs.

To further enhance the logical reasoning capabilities of GPT-4, the latest version of the model, Google’s researchers have incorporated image processing capabilities and employed chain-of-thought prompting to break down complex problems into manageable steps.

However, despite their impressive performance in certain areas, language models like GPT-3 and GPT-4 still have limitations. Recent studies have revealed their imperfections in providing medical information, generating code, and solving math problems. Additionally, these models struggle with visual puzzles and understanding real-world physics and spaces. In order to address these shortcomings, Google aims to combine multimodal language models with robots to enhance problem-solving abilities.

Although language models excel at test-taking, they lack the same level of general intelligence as humans. It remains uncertain whether their cognitive processes resemble those of humans, and their limits and potential are yet to be fully understood. To gain deeper insights, experts emphasize the need for greater transparency and openness regarding the models’ software and training data, an aspect that has been criticized in relation to the guarded nature of OpenAI’s LLM research.