10 research outputs found
Recommended from our members
Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
Objective: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. Materials and Methods The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. Results: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. Conclusion: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies
Histogram of DAS28 scores for 25 discordant cases.
<p>These discordant cases are between DAS labels and domain expert labels among 93 random samples from the Training Set (the remaining 68 cases were concordant).</p
Scatter plot of DAS28 scores and log transformed lab values.
<p>(Left) Scatter plot of DAS28 scores and log transformed lab values for 1320 correctly classified notes. (Right) Scatter plot of DAS28 scores and log transformed lab values for 429 misclassified notes. The lines are the regression lines.</p
Error analysis of the best performing classifier.
<p>Out of 429 misclassified cases (using DAS28 derived dichotomous labels as gold standard), the majority are from the Moderate and Low disease activity categories.</p
Ranges of lab values.
<p>(Left) Range of lab values for Moderate/High (MH) disease activity cases vs. Range of lab values for Low/Remission (LR) disease activity cases among 1320 correctly classified notes. (Right) Range of lab values for Moderate/High (MH) disease activity cases vs. Range of lab values for Low/Remission (LR) disease activity cases among 429 misclassified notes.</p
Lab-value and 20 top-ranked CUIs.
<p>Their Chi-square values were visualized as bars. Longer bars suggest higher impact. The negative signs “-” before some of the CUIs suggest negation (CUI – Unified Medical Language System Concept Unique Identifier).</p
Corpus selection effect on Test set 1 using a linear-kernel SVM model.
<p>Corpus selection effect on Test set 1 using a linear-kernel SVM model.</p