The disease is a core concept in the medical field, and the task of
normalizing disease names is the basis of all disease-related tasks. However,
due to the multi-axis and multi-grain nature of disease names, incorrect
information is often injected and harms the performance when using general text
data augmentation techniques. To address the above problem, we propose a set of
data augmentation techniques that work together as an augmented training task
for disease normalization. Our data augmentation methods are based on both the
clinical disease corpus and standard disease corpus derived from ICD-10 coding.
Extensive experiments are conducted to show the effectiveness of our proposed
methods. The results demonstrate that our methods can have up to 3\%
performance gain compared to non-augmented counterparts, and they can work even
better on smaller datasets