CMB: A Comprehensive Medical Benchmark in Chinese

Chen, Guiming Hardy; Chen, Zhihong; Jiang, Feng; Li, Haizhou; Li, Jianquan; Song, Dingjie; Wan, Xiang; Wang, Benyou; Wang, Xidong; Xiao, Qingying; Zhang, Zhiyi

CMB: A Comprehensive Medical Benchmark in Chinese

Authors: Guiming Hardy Chen
Zhihong Chen
Feng Jiang
Haizhou Li
Jianquan Li
Dingjie Song
Xiang Wan
Benyou Wang
Xidong Wang
Qingying Xiao
Zhiyi Zhang
Publication date: 17 August 2023
Publisher

Abstract

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. It is worth noting that our benchmark is not devised as a leaderboard competition but as an instrument for self-assessment of model advancements. We hope this benchmark could facilitate the widespread adoption and enhancement of medical LLMs within China. Check details in \url{https://cmedbenchmark.llmzoo.com/}

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.08833

Last time updated on 24/08/2023