19 research outputs found
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
<div><p>In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.</p></div
The ten different phenotypes used for this study.
<p>The first column shows the name of the phenotype, the second column shows the number of positive examples our of the total 1,610 notes, and the third shows the <i>Îș</i> coefficient as inter-rater agreement measure. The last column lists the definition for each phenotype that was used to identify and annotate the phenotype.</p
Comparison of achieved F1-scores across all tested phenotypes.
<p>The left three models directly classify from text, the right two models are concept-extraction based. The CNN outperforms the other models on most tasks.</p
The most salient phrases for advanced heart failure and alcohol abuse.
<p>The salient cTAKES CUIs are extracted from the filtered RF model.</p
Impact of phrase length on model performance.
<p>The figure shows the change in F1-score between a model that considers only single words and a model that phrases up to a length of 5.</p
This table shows the best performing model for each approach and phenotype.
<p>We show precision, recall, F1-Score, and AUC.</p