Exploring the Trade-Offs: Unified Large Language Models vs Local
  Fine-Tuned Models for Highly-Specific Radiology NLI Task

Cao, Chao; Dai, Haixing; Li, Gang; Li, Quanzheng; Li, Xiang; Liu, Tianming; Liu, Wei; Liu, Zhengliang; Ma, Chong; Shen, Dinggang; Wu, Zihao; Yu, Xiaowei; Zhang, Lu; Zhao, Lin; Zhu, Dajiang

Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task

Authors: Chao Cao
Haixing Dai
Gang Li
Quanzheng Li
Xiang Li
Tianming Liu
Wei Liu
Zhengliang Liu
Chong Ma
Dinggang Shen
Zihao Wu
Xiaowei Yu
Lu Zhang
Lin Zhao
Dajiang Zhu
Publication date: 18 April 2023
Publisher

Abstract

Recently, ChatGPT and GPT-4 have emerged and gained immense global attention due to their unparalleled performance in language processing. Despite demonstrating impressive capability in various open-domain tasks, their adequacy in highly specific fields like radiology remains untested. Radiology presents unique linguistic phenomena distinct from open-domain data due to its specificity and complexity. Assessing the performance of large language models (LLMs) in such specific domains is crucial not only for a thorough evaluation of their overall performance but also for providing valuable insights into future model design directions: whether model design should be generic or domain-specific. To this end, in this study, we evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty. Our results show that 1) GPT-4 outperforms ChatGPT in the radiology NLI task; 2) other specifically fine-tuned models require significant amounts of data samples to achieve comparable performance to ChatGPT/GPT-4. These findings demonstrate that constructing a generic model that is capable of solving various tasks across different domains is feasible

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.09138

Last time updated on 22/04/2023