Does Enrichment of Clinical Texts by Ontology Concepts Increases Classification Accuracy?

Abstract

In the medical domain, multiple ontologies and terminology systems are available. However, existing classification and prediction algorithms in the clinical domain often ignore or insufficiently utilize semantic information as it is provided in those ontologies. To address this issue, we introduce a concept for augmenting embeddings, the input to deep neural networks, with semantic information retrieved from ontologies. To do this, words and phrases of sentences are mapped to concepts of a medical ontology aggregating synonyms in the same concept. A semantically enriched vector is generated and used for sentence classification. We study our approach on a sentence classification task using a real world dataset which comprises 640 sentences belonging to 22 categories. A deep neural network model is defined with an embedding layer followed by two LSTM layers and two dense layers. Our experiments show, classification accuracy without content enriched embeddings is for some categories higher than without enrichment. We conclude that semantic information from ontologies has potential to provide a useful enrichment of text. Future research will assess to what extent semantic relationships from the ontology can be used for enrichment

    Similar works