Domain-specific word embeddings for ICD-9-CM classification

Abstract

In this work we evaluate domain-speci�c embedding models induced from textual resources in the medical domain. The International Classi�cation of Diseases (ICD) is a standard, broadly used classi�cation system, that codes a large number of speci�c diseases, symptoms, injuries and medical procedures into numerical classes. Assigning a code to a clinical case means classifying that case into one or more particular discrete class, hence allowing further statistics studies and automated calculations. The possibility to have a discrete code instead of a text in natural language is intuitively a great advantage for data processing systems. The use of such classi�cation is becoming increasingly important for, but not limited to, economic and policy-making purposes. Experiments show that domain-speci�c word embeddings, instead of a general one, improves classi�ers in terms of frequency similarities between words

    Similar works