15 research outputs found

    Phenotyping hypotensive patients in critical care using hospital discharge summaries

    Get PDF
    Among critically-ill patients, hypotension represents a failure in compensatory mechanisms and may lead to organ hypoperfusion and failure. In this work, we adopt a datadriven approach for phenotype discovery and visualization of patient similarity and cohort structure in the intensive care unit (ICU). We used Hierarchical Dirichlet Process (HDP) as a non-parametric topic modeling technique to automatically learn a d-dimensional feature representation of patients that captures the latent 'topic' structure of diseases, symptoms, medications, and findings documented in hospital discharge summaries. We then used the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm to convert the d-dimensional latent structure learned from HDP into a matrix of pairwise similarities for visualizing patient similarity and cohort structure. Using discharge summaries of a large patient cohort from the MIMIC II database, we evaluated the clinical utility of the discovered topic structure in phenotyping critically-ill patients who experienced hypotensive episodes. Our results indicate that the approach is able to reveal clinically interpretable clustering structure within our cohort and may potentially provide valuable insights to better understand the association between disease phenotypes and outcomes.National Institutes of Health (U.S.) (Grant R01-EB017205)National Institutes of Health (U.S.) (Grant R01-EB001659)National Institutes of Health (U.S.) (Grant R01GM104987

    Latent topic discovery of clinical concepts from hospital discharge summaries of a heterogeneous patient cohort

    Get PDF
    Patients in critical care often exhibit complex disease patterns. A fundamental challenge in clinical research is to identify clinical features that may be characteristic of adverse patient outcomes. In this work, we propose a data-driven approach for phenotype discovery of patients in critical care. We used Hierarchical Dirichlet Process (HDP) as a non-parametric topic modeling technique to automatically discover the latent "topic" structure of diseases, symptoms, and findings documented in hospital discharge summaries. We show that the latent topic structure can be used to reveal phenotypic patterns of diseases and symptoms shared across subgroups of a patient cohort, and may contain prognostic value in stratifying patients' post hospital discharge mortality risks. Using discharge summaries of a large patient cohort from the MIMIC II database, we evaluate the clinical utility of the discovered topic structure in identifying patients who are at high risk of mortality within one year post hospital discharge. We demonstrate that the learned topic structure has statistically significant associations with mortality post hospital discharge, and may provide valuable insights in defining new feature sets for predicting patient outcomes.National Institutes of Health (U.S.) (Grant R01-EB001659)National Institute of Biomedical Imaging and Bioengineering (U.S.) (Grant R01GM104987

    AD-BERT: Using Pre-trained contextualized embeddings to Predict the Progression from Mild Cognitive Impairment to Alzheimer's Disease

    Full text link
    Objective: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). Materials and Methods: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000-2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting, and then pretrained a BERT model for AD (AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. The embeddings of all the sections of a patient's notes processed by AD-BERT were combined by MaxPooling to compute the probability of MCI-to-AD progression. For replication, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. Results: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.8170 and F1 score of 0.4178 on NMEDW dataset and AUC of 0.8830 and F1 score of 0.6836 on WCM dataset. Conclusion: We developed a deep learning framework using BERT models which provide an effective solution for prediction of MCI-to-AD progression using clinical note analysis

    Cooperative Semantic Information Processing for Literature-Based Biomedical Knowledge Discovery

    Get PDF
    Given that data is increasing exponentially everyday, extracting and understanding the information, themes and relationships from large collections of documents is more and more important to researchers in many areas. In this paper, we present a cooperative semantic information processing system to help biomedical researchers understand and discover knowledge in large numbers of titles and abstracts from PubMed query results. Our system is based on a prevalent technique, topic modeling, which is an unsupervised machine learning approach for discovering the set of semantic themes in a large set of documents. In addition, we apply a natural language processing technique to transform the “bag-of-words” assumption of topic models to the “bag-of-important-phrases” assumption and build an interactive visualization tool using a modified, open-source, Topic Browser. In the end, we conduct two experiments to evaluate the approach. The first, evaluates whether the “bag-of-important-phrases” approach is better at identifying semantic themes than the standard “bag-of-words” approach. This is an empirical study in which human subjects evaluate the quality of the resulting topics using a standard “word intrusion test” to determine whether subjects can identify a word (or phrase) that does not belong in the topic. The second is a qualitative empirical study to evaluate how well the system helps biomedical researchers explore a set of documents to discover previously hidden semantic themes and connections. The methodology for this study has been successfully used to evaluate other knowledge-discovery tools in biomedicine
    corecore