1 research outputs found
Modeling Diagnostic Label Correlation for Automatic ICD Coding
Given the clinical notes written in electronic health records (EHRs), it is
challenging to predict the diagnostic codes which is formulated as a
multi-label classification task. The large set of labels, the hierarchical
dependency, and the imbalanced data make this prediction task extremely hard.
Most existing work built a binary prediction for each label independently,
ignoring the dependencies between labels. To address this problem, we propose a
two-stage framework to improve automatic ICD coding by capturing the label
correlation. Specifically, we train a label set distribution estimator to
rescore the probability of each label set candidate generated by a base
predictor. This paper is the first attempt at learning the label set
distribution as a reranking module for medical code prediction. In the
experiments, our proposed framework is able to improve upon best-performing
predictors on the benchmark MIMIC datasets. The source code of this project is
available at https://github.com/MiuLab/ICD-Correlation.Comment: NAACL 2021 Long Paper. Code available at
https://github.com/MiuLab/ICD-Correlatio