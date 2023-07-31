In the latest update from the SuperCLUE benchmark, the July rankings have been released. The benchmark includes 3700 confidential test questions and 20 participating models from Chinese and international labs.

The SuperCLUE benchmark evaluates these language models based on three main dimensions: foundational capabilities, specialized and academic capabilities, and Chinese-language particularities. Among the models from Chinese labs, Baidu’s ErnieBot (v2.2.0) emerged as the top performer, surpassing even Anthropic’s Claude-2 in the overall SuperCLUE score.

Baidu’s ErnieBot excelled in Chinese-language particularities, contributing to its high ranking. This is a significant achievement for Baidu, as their Ernie models were not included in a previous ranking despite being recognized as strong Chinese language models.

The overall SuperCLUE ranking showcases various information such as the model’s name, lab/group, scores in specific categories, and whether the model is exclusive or commercially available. Non-open source models like GPT-4.0, Claude, and gpt-3.5 also participate in the ranking, but they are not numerically ranked or awarded medals.

In the July update, two new models from Chinese labs, Baichuan Intelligence’s Baichuan-13B-Chat and Shanghai Artificial Intelligence Laboratory and SenseTime’s internlm-chat-7b, showcased good but not outstanding results. However, there still remains a significant gap in the overall SuperCLUE score between these models and the leading Chinese models, as well as OpenAI’s GPT-4.

The SuperCLUE benchmark is vital in evaluating and comparing large language models, providing valuable insights into their capabilities and performance, particularly in Chinese language processing.