758 research outputs found

    Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study

    Get PDF
    BACKGROUND: Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities. OBJECTIVE: In this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event-related relation classification. METHODS: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT). RESULTS: Our experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4% (P \u3c .001) and 7.9% (P \u3c .001), respectively. CONCLUSIONS: In this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation

    Using the Literature to Identify Confounders

    Get PDF
    Prior work in causal modeling has focused primarily on learning graph structures and parameters to model data generating processes from observational or experimental data, while the focus of the literature-based discovery paradigm was to identify novel therapeutic hypotheses in publicly available knowledge. The critical contribution of this dissertation is to refashion the literature-based discovery paradigm as a means to populate causal models with relevant covariates to abet causal inference. In particular, this dissertation describes a generalizable framework for mapping from causal propositions in the literature to subgraphs populated by instantiated variables that reflect observational data. The observational data are those derived from electronic health records. The purpose of causal inference is to detect adverse drug event signals. The Principle of the Common Cause is exploited as a heuristic for a defeasible practical logic. The fundamental intuition is that improbable co-occurrences can be “explained away” with reference to a common cause, or confounder. Semantic constraints in literature-based discovery can be leveraged to identify such covariates. Further, the asymmetric semantic constraints of causal propositions map directly to the topology of causal graphs as directed edges. The hypothesis is that causal models conditioned on sets of such covariates will improve upon the performance of purely statistical techniques for detecting adverse drug event signals. By improving upon previous work in purely EHR-based pharmacovigilance, these results establish the utility of this scalable approach to automated causal inference

    Understanding Patient Safety Reports via Multi-label Text Classification and Semantic Representation

    Get PDF
    Medical errors are the results of problems in health care delivery. One of the key steps to eliminate errors and improve patient safety is through patient safety event reporting. A patient safety report may record a number of critical factors that are involved in the health care when incidents, near misses, and unsafe conditions occur. Therefore, clinicians and risk management can generate actionable knowledge by harnessing useful information from reports. To date, efforts have been made to establish a nationwide reporting and error analysis mechanism. The increasing volume of reports has been driving improvement in quantity measures of patient safety. For example, statistical distributions of errors across types of error and health care settings have been well documented. Nevertheless, a shift to quality measure is highly demanded. In a health care system, errors are likely to occur if one or more components (e.g., procedures, equipment, etc.) that are intrinsically associated go wrong. However, our understanding of what and how these components are connected is limited for at least two reasons. Firstly, the patient safety reports present difficulties in aggregate analysis since they are large in volume and complicated in semantic representation. Secondly, an efficient and clinically valuable mechanism to identify and categorize these components is absent. I strive to make my contribution by investigating the multi-labeled nature of patient safety reports. To facilitate clinical implementation, I propose that machine learning and semantic information of reports, e.g., semantic similarity between terms, can be used to jointly perform automated multi-label classification. My work is divided into three specific aims. In the first aim, I developed a patient safety ontology to enhance semantic representation of patient safety reports. The ontology supports a number of applications including automated text classification. In the second aim, I evaluated multilabel text classification algorithms on patient safety reports. The results demonstrated a list of productive algorithms with balanced predictive power and efficiency. In the third aim, to improve the performance of text classification, I developed a framework for incorporating semantic similarity and kernel-based multi-label text classification. Semantic similarity values produced by different semantic representation models are evaluated in the classification tasks. Both ontology-based and distributional semantic similarity exerted positive influence on classification performance but the latter one shown significant efficiency in terms of the measure of semantic similarity. Our work provides insights into the nature of patient safety reports, that is a report can be labeled by multiple components (e.g., different procedures, settings, error types, and contributing factors) it contains. Multi-labeled reports hold promise to disclose system vulnerabilities since they provide the insight of the intrinsically correlated components of health care systems. I demonstrated the effectiveness and efficiency of the use of automated multi-label text classification embedded with semantic similarity information on patient safety reports. The proposed solution holds potential to incorporate with existing reporting systems, significantly reducing the workload of aggregate report analysis

    Savana: Un Entorno Integral de Extracción de Información y Expansión de Terminologías en el Dominio de la Medicina

    Get PDF
    Terminological databases constitute a fundamental source of information in the medical domain. They are used daily both by practitioners in the area, as well as in academia. Several resources of this kind are available, e.g. CIE, SnomedCT or UMLS (Unified Medical Language System). These terminological databases are of high quality due to them being the result of collaborative expert knowledge. However, they may show certain drawbacks in terms of faithfully representing the ever-changing medical domain. Therefore, systems aimed at capturing novel terminological knowledge in heterogeneous text sources, and able to include them in standard terminologies have the potential to add great value to such repositories. This paper presents, first, Savana, a Biomedical Information Extraction system which, combined with a validation phase carried out by medical practitioners, is used to populate the Spanish branch of SnomedCT with novel knowledge. Second, we describe and evaluate a system which, given a novel medical term, finds its most likely hypernym, thus becoming an enabler in the task of terminological database enrichment and expansion.Las bases terminológicas médicas constituyen una fuente de información fundamental en el dominio médico, ya que son utilizadas a diario tanto por profesionales en el sector como en el ámbito académico. Existen numerosos recursos de este tipo, tales como la Clasificación Internacional de Enfermedades (CIE), SnomedCT, o UMLS (Unified Medical Language System). La calidad de estas bases terminológicas es en general alta, dado que están construidas manualmente por expertos. Sin embargo, su capacidad para representar fielmente un dominio como el médico, que se encuentra en constante evolución, es limitada. Por tanto, el desarrollo de sistemas capaces de capturar nuevo conocimiento en fuentes textuales heterogéneas e incluirlas en terminologías estándar tienen el potencial de añadir un gran valor añadido a dichas terminologías. Este artículo presenta, en primer lugar, Savana, un sistema de extracción de información biomédica que, combinado con validación por parte de profesionales médicos, es utilizado para popular la rama española de SnomedCT con nuevo conocimiento. En segundo lugar, describimos y evaluamos un sistema que, dado un término médico nuevo, le asigna su hiperónimo más probable, constituyendo así un facilitador en tareas de enriquecimiento y expansión de bases terminológicas médicas.This work is partially funded by the Spanish Ministry of Economy and Competitiveness under the following sponsorships: Maria de Maeztu Units of Excellence Programme (MDM-2015-0502), and TUNER project (TIN2015-65308-C5-5-R, MINECO/FEDER, UE)
    corecore