Exploring semantic information in disease: Simple Data Augmentation
  Techniques for Chinese Disease Normalization

Cui, Wenqian; Fu, Xiangling; Liu, Shaohui; Liu, Xien; Wu, Ji

Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Authors: Wenqian Cui
Xiangling Fu
Shaohui Liu
Xien Liu
Ji Wu
Publication date: 2 June 2023
Publisher

Abstract

The disease is a core concept in the medical field, and the task of normalizing disease names is the basis of all disease-related tasks. However, due to the multi-axis and multi-grain nature of disease names, incorrect information is often injected and harms the performance when using general text data augmentation techniques. To address the above problem, we propose a set of data augmentation techniques that work together as an augmented training task for disease normalization. Our data augmentation methods are based on both the clinical disease corpus and standard disease corpus derived from ICD-10 coding. Extensive experiments are conducted to show the effectiveness of our proposed methods. The results demonstrate that our methods can have up to 3\% performance gain compared to non-augmented counterparts, and they can work even better on smaller datasets

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.01931

Last time updated on 08/06/2023