Search CORE

133 research outputs found

Recommended from our members

Electronic Health Record Summarization over Heterogeneous and Irregularly Sampled Clinical Data

Author: Pivovarov Rimma
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

The increasing adoption of electronic health records (EHRs) has led to an unprecedented amount of patient health information stored in an electronic format. The ability to comb through this information is imperative, both for patient care and computational modeling. Creating a system to minimize unnecessary EHR data, automatically distill longitudinal patient information, and highlight salient parts of a patient’s record is currently an unmet need. However, summarization of EHR data is not a trivial task, as there exist many challenges with reasoning over this data. EHR data elements are most often obtained at irregular intervals as patients are more likely to receive medical care when they are ill, than when they are healthy. The presence of narrative documentation adds another layer of complexity as the notes are riddled with over-sampled text, often caused by the frequent copy-and-pasting during the documentation process. This dissertation synthesizes a set of challenges for automated EHR summarization identified in the literature and presents an array of methods for dealing with some of these challenges. We used hybrid data-driven and knowledge-based approaches to examine abundant redundancy in clinical narrative text, a data-driven approach to identify and mitigate biases in laboratory testing patterns with implications for using clinical data for research, and a probabilistic modeling approach to automatically summarize patient records and learn computational models of disease with heterogeneous data types. The dissertation also demonstrates two applications of the developed methods to important clinical questions: the questions of laboratory test overutilization and cohort selection from EHR data

Columbia University Academic Commons

Understanding PubMed Search Results using Topic Models and Interactive Information Visualization

Author: Yu Zhiguo
Publication venue: DigitalCommons@TMC
Publication date: 01/01/2017
Field of study

With data increasing exponentially, extracting and understanding information, themes and relationships from larger collections of documents is becoming more and more important to researchers in many areas. PubMed, which comprises more than 25 million citations, uses Medical Subject Headings (MeSH) to index articles to better facilitate their management, searching and indexing. However, researchers are still challenged to find and then get a meaningful overview of a set of documents in a specific area of interest. This is due in part to several limitations of MeSH terms, including: the need to monitor and expand the vocabulary; the lack of concept coverage for newly developing areas; human inconsistency in assigning codes; and the time required to manually index an exponentially growing corpus. Another reason for this challenge is that neither PubMed itself nor its related Web tools can help users see high level themes and hidden semantic structures in the biomedical literature. Topic models are a class of statistical machine learning algorithms that when given a set of natural language documents, extract the semantic themes (topics) from the set of documents, describe the topics for each document, and the semantic similarity of topics and documents. Researchers have shown that these latent themes can help humans better understand and search documents. Unlike MeSH terms, which are created based on important concepts throughout the literature, topics extracted from a subset of documents are specific to those documents. Thus they can find document-specific themes that may not exist in MeSH terms. Such themes may give a subject area-specific set of themes for browsing search results, and provide a broader overview of the search results. This first part of this dissertation presents the TopicalMeSH representation, which exploits the ‘correspondence’ between topics generated using latent Dirichlet allocation (LDA) and MeSH terms to create new document representations that combine MeSH terms and latent topic vectors. In an evaluation with 15 systematic drug review corpora, TopicalMeSH performed better than MeSH in both document retrieval and classification tasks. The second part of this work introduces the “Hybrid Topic”, an alternative LDA approach that uses a ‘bag-of-MeSH&words’ approach, instead of just ‘bag-of-words’, to test whether the addition of labels (e.g. MeSH descriptors) can improve the quality and facilitate the interpretation of LDA-generated topics. An evaluation of this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus demonstrated that the coherence of ‘hybrid topics’ is higher than that of regular bag-of-words topics in both specialized and general copora. The last part of this dissertation presents a visualization tool based on the ‘hybrid topics’ model that could allow users to interactively use topic models and MeSH terms to efficiently and effectively retrieve relevant information from tons of PubMed search results. A preliminary user study has been conducted with 6 participants. All of them agree that this tool can quickly help them understand PubMed search results and identify target articles

DigitalCommons@The Texas Medical Center

Recommended from our members

Patient Record Summarization Through Joint Phenotype Learning and Interactive Visualization

Author: Levy-Fix Gal
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Complex patient are becoming more and more of a challenge to the health care system given the amount of care they require and the amount of documentation needed to keep track of their state of health and treatment. Record keeping using the EHR makes this easier but mounting amounts of patient data also means that clinicians are faced with information overload. Information overload has been shown to have deleterious effects on care, with increased safety concerns due to missed information. Patient record summarization has been a promising mitigator for information overload. Subsequently, a lot of research has been dedicated to record summarization since the introduction of EHRs. In this dissertation we examine whether unsupervised inference methods can derive patient problem-oriented summaries, that are robust to different patients. By grounding our experiments with HIV patients we leverage the data of a group of patients that are similar in that they share one common disease (HIV) but also exhibit complex histories of diverse comorbidities. Using a user-centered, iterative design process, we design an interactive, longitudinal patient record summarization tool, that leverages automated inferences about the patient's problems. We find that unsupervised, joint learning of problems using correlated topic models, adapted to handle the multiple data types (structured and unstructured) of the EHR, is successful in identifying the salient problems of complex patients. Utilizing interactive visualization that exposes inference results to users enables them to make sense of a patient's problems over time and to answer questions about a patient more accurately and faster than using the EHR alone

Columbia University Academic Commons

Medical Secretaries’ Registration Work in the Data-Driven Healthcare Era

Author: Bertelsen Pernille Scholdan
Knudsen Casper
Publication venue: IOS Press
Publication date: 01/01/2023
Field of study

VBN

Proceedings from The 16th Scandinavian Conference on Health Informatics 2018, Aalborg, Denmark August 28–29, 2018

Author: Bygholm Ann
Hejlesen Ole
Niss Karsten Ulrik
Pape-Haugaard Louise
Zhou Chunfang
Publication venue: 'Linkoping University Electronic Press'
Publication date: 24/08/2018
Field of study

VBN

Unsupervised learning methods for identifying and evaluating disease clusters in electronic health records

Author: Alexander Nonie
Publication venue: UCL (University College London)
Publication date: 28/01/2023
Field of study

Introduction Clustering algorithms are a class of algorithms that can discover groups of observations in complex data and are often used to identify subtypes of heterogeneous diseases in electronic health records (EHR). Evaluating clustering experiments for biological and clinical significance is a vital but challenging task due to the lack of consensus on best practices. As a result, the translation of findings from clustering experiments to clinical practice is limited. Aim The aim of this thesis was to investigate and evaluate approaches that enable the evaluation of clustering experiments using EHR. Methods We conducted a scoping review of clustering studies in EHR to identify common evaluation approaches. We systematically investigated the performance of the identified approaches using a cohort of Alzheimer's Disease (AD) patients as an exemplar comparing four different clustering methods (K-means, Kernel K-means, Affinity Propagation and Latent Class Analysis.). Using the same population, we developed and evaluated a method (MCHAMMER) that tested whether clusterable structures exist in EHR. To develop this method we tested several cluster validation indexes and methods of generating null data to see which are the best at discovering clusters. In order to enable the robust benchmarking of evaluation approaches, we created a tool that generated synthetic EHR data that contain known cluster labels across a range of clustering scenarios. Results Across 67 EHR clustering studies, the most popular internal evaluation metric was comparing cluster results across multiple algorithms (30% of studies). We examined this approach conducting a clustering experiment on AD patients using a population of 10,065 AD patients and 21 demographic, symptom and comorbidity features. K-means found 5 clusters, Kernel K means found 2 clusters, Affinity propagation found 5 and latent class analysis found 6. K-means 4 was found to have the best clustering solution with the highest silhouette score (0.19) and was more predictive of outcomes. The five clusters found were: typical AD (n=2026), non-typical AD (n=1640), cardiovascular disease cluster (n=686), a cancer cluster (n=1710) and a cluster of mental health issues, smoking and early disease onset (n=1528), which has been found in previous research as well as in the results of other clustering methods. We created a synthetic data generation tool which allows for the generation of realistic EHR clusters that can vary in separation and number of noise variables to alter the difficulty of the clustering problem. We found that decreasing cluster separation did increase cluster difficulty significantly whereas noise variables increased cluster difficulty but not significantly. To develop the tool to assess clusters existence we tested different methods of null dataset generation and cluster validation indices, the best performing null dataset method was the min max method and the best performing indices we Calinksi Harabasz index which had an accuracy of 94%, Davies Bouldin index (97%) silhouette score ( 93%) and BWC index (90%). We further found that when clusters were identified using the Calinski Harabasz index they were more likely to have significantly different outcomes between clusters. Lastly we repeated the initial clustering experiment, comparing 10 different pre-processing methods. The three best performing methods were RBF kernel (2 clusters), MCA (4 clusters) and MCA and PCA (6 clusters). The MCA approach gave the best results highest silhouette score (0.23) and meaningful clusters, producing 4 clusters; heart and circulatory( n=1379), early onset mental health (n=1761), male cluster with memory loss (n = 1823), female with more problem (n=2244). Conclusion We have developed and tested a series of methods and tools to enable the evaluation of EHR clustering experiments. We developed and proposed a novel cluster evaluation metric and provided a tool for benchmarking evaluation approaches in synthetic but realistic EHR

UCL Discovery

Pacific Symposium on Biocomputing 2023

Author
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field

OAPEN Library