CLEVA: Chinese Language Models EVAluation Platform

Chen, Zhi; Hu, Zi-Yuan; Huang, Shijia; Huang, Yongfeng; Li, Yanyang; Lin, Dahua; Lyu, Michael R.; Su, Xiaohui; Wang, Liwei; Zhao, Jianqiao; Zheng, Duo

CLEVA: Chinese Language Models EVAluation Platform

Authors: Zhi Chen
Zi-Yuan Hu
Shijia Huang
Yongfeng Huang
Yanyang Li
Dahua Lin
Michael R. Lyu
Xiaohui Su
Liwei Wang
Jianqiao Zhao
Duo Zheng
Publication date: 16 October 2023
Publisher

Abstract

With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 Chinese LLMs have validated CLEVA's efficacy.Comment: EMNLP 2023 System Demonstrations camera-read

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.04813

Last time updated on 12/08/2023