7 research outputs found

    Combining deep learning with token selection for patient phenotyping from electronic health records.

    Get PDF
    Artificial intelligence provides the opportunity to reveal important information buried in large amounts of complex data. Electronic health records (eHRs) are a source of such big data that provide a multitude of health related clinical information about patients. However, text data from eHRs, e.g., discharge summary notes, are challenging in their analysis because these notes are free-form texts and the writing formats and styles vary considerably between different records. For this reason, in this paper we study deep learning neural networks in combination with natural language processing to analyze text data from clinical discharge summaries. We provide a detail analysis of patient phenotyping, i.e., the automatic prediction of ten patient disorders, by investigating the influence of network architectures, sample sizes and information content of tokens. Importantly, for patients suffering from Chronic Pain, the disorder that is the most difficult one to classify, we find the largest performance gain for a combined word- and sentence-level input convolutional neural network (ws-CNN). As a general result, we find that the combination of data quality and data quantity of the text data is playing a crucial role for using more complex network architectures that improve significantly beyond a word-level input CNN model. From our investigations of learning curves and token selection mechanisms, we conclude that for such a transition one requires larger sample sizes because the amount of information per sample is quite small and only carried by few tokens and token categories. Interestingly, we found that the token frequency in the eHRs follow a Zipf law and we utilized this behavior to investigate the information content of tokens by defining a token selection mechanism. The latter addresses also issues of explainable AI

    Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

    Full text link
    Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.Comment: C.F. and R.M. share senior authorshi

    Deep Learning for Medication Recommendation: A Systematic Survey

    Get PDF
    ABSTRACTMaking medication prescriptions in response to the patient's diagnosis is a challenging task. The number of pharmaceutical companies, their inventory of medicines, and the recommended dosage confront a doctor with the well-known problem of information and cognitive overload. To assist a medical practitioner in making informed decisions regarding a medical prescription to a patient, researchers have exploited electronic health records (EHRs) in automatically recommending medication. In recent years, medication recommendation using EHRs has been a salient research direction, which has attracted researchers to apply various deep learning (DL) models to the EHRs of patients in recommending prescriptions. Yet, in the absence of a holistic survey article, it needs a lot of effort and time to study these publications in order to understand the current state of research and identify the best-performing models along with the trends and challenges. To fill this research gap, this survey reports on state-of-the-art DL-based medication recommendation methods. It reviews the classification of DL-based medication recommendation (MR) models, compares their performance, and the unavoidable issues they face. It reports on the most common datasets and metrics used in evaluating MR models. The findings of this study have implications for researchers interested in MR models

    Toward Precision Medicine in Intensive Care: Leveraging Electronic Health Records and Patient Similarity

    Get PDF
    The growing adoption of Electronic Health Record (EHR) systems has resulted in an unprecedented amount of data. This availability of data has also opened up the opportunity to utilize EHRs for providing more customized care for each patient by considering individual variability, which is the goal of precision medicine. In this context, patient similarity (PS) analytics have been introduced to facilitate data analysis through investigating the similarities in patients’ data, and, ultimately, to help improve the healthcare system. This dissertation is presented in six chapters and focuses on employing PS analytics in data-rich intensive care units. Chapter 1 provides a review of the literature and summarizes studies describing approaches for predicting patients’ future health status based on EHR and PS. Chapter 2 demonstrates the informativeness of missing data in patient profiles and introduces missing data indicators to use this information in mortality prediction. The results demonstrate that including indicators with observed measurements in a set of well-known prediction models (logistic regression, decision tree, and random forest) can improve the predictive accuracy. Chapter 3 builds upon the previous results and utilizes these missing indicators to reveal patient subpopulations based on their similarity in laboratory test ordering being used for them. In this chapter, the Density-based Spatial Clustering of Applications with Noise method, was employed to group the patients into clusters using the indicators generated in the previous study. Results confirmed that missing indicators capture the laboratory-test-ordering patterns that are informative and can be used to identify similar patient subpopulations. Chapter 4 investigates the performance of a multifaceted PS metric constructed by utilizing appropriate similarity metrics for specific clinical variables (e.g. vital signs, ICD-9, etc.). The proposed PS metric was evaluated in a 30-day post-discharge mortality prediction problem. Results demonstrate that PS-based prediction models with the new PS metric outperformed population-based prediction models. Moreover, the multifaceted PS metric significantly outperformed cosine and Euclidean PS metric in k-nearest neighbors setting. Chapter 5 takes the previous results into consideration and looks for potential subpopulations among septic patients. Sepsis is one of the most common causes of death in Canada. The focus of this chapter is on longitudinal EHR data which are a collection of observations of measurements made chronologically for each patient. This chapter employs Functional Principal Component Analysis to derive the dominant modes of variation in septic patients’ EHR's. Results confirm that including temporal data in the analysis can help in identifying subgroups of septic patients. Finally, Chapter 6 provides a discussion of results from previous chapters. The results indicate the informativeness of missing data and how PS can help in improving the performance of predictive modeling. Moreover, results show that utilizing the temporal information in PS calculation improves patient stratification. Finally, the discussion identifies limitations and directions for future research

    Preface

    Get PDF
    corecore