Substituting clinical features using synthetic medical phrases: Medical text data augmentation techniques.

Abstract

Biomedical natural language processing (NLP) has an important role in extracting consequential information in medical discharge notes. Detecting meaningful features from unstructured notes is a challenging task in medical document classification. The domain specific phrases and different synonyms within the medical documents make it hard to analyze them. Analyzing clinical notes becomes more challenging for short documents like abstract texts. All of these can result in poor classification performance, especially when there is a shortage of the clinical data in real life. Two new approaches (an ontology-guided approach and a combined ontology-based with dictionary-based approach) are suggested for augmenting medical data to enrich training data. Three different deep learning approaches are used to evaluate the classification performance of the proposed methods. The obtained results show that the proposed methods improved the classification accuracy in clinical notes classification

    Similar works

    Full text

    thumbnail-image