5,720 research outputs found

    Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning

    Get PDF
    BACKGROUND: Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. OBJECTIVE: To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. METHODS: We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. RESULTS: Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. CONCLUSIONS: It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•œ ์˜ํ•™ ๊ฐœ๋… ๋ฐ ํ™˜์ž ํ‘œํ˜„ ํ•™์Šต๊ณผ ์˜๋ฃŒ ๋ฌธ์ œ์—์˜ ์‘์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ์ „๊ตญ๋ฏผ ์˜๋ฃŒ ๋ณดํ—˜๋ฐ์ดํ„ฐ์ธ ํ‘œ๋ณธ์ฝ”ํ˜ธํŠธDB๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ์˜ํ•™ ๊ฐœ๋… ๋ฐ ํ™˜์ž ํ‘œํ˜„ ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ์˜๋ฃŒ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์ˆœ์ฐจ์ ์ธ ํ™˜์ž ์˜๋ฃŒ ๊ธฐ๋ก๊ณผ ๊ฐœ์ธ ํ”„๋กœํŒŒ์ผ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ™˜์ž ํ‘œํ˜„์„ ํ•™์Šตํ•˜๊ณ  ํ–ฅํ›„ ์งˆ๋ณ‘ ์ง„๋‹จ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ์žฌ๊ท€์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‹ค์–‘ํ•œ ์„ฑ๊ฒฉ์˜ ํ™˜์ž ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ˜ผํ•ฉํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์—ฌ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์—ˆ๋‹ค. ๋˜ํ•œ ํ™˜์ž์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ์ด๋ฃจ๋Š” ์˜๋ฃŒ ์ฝ”๋“œ๋“ค์„ ๋ถ„์‚ฐ ํ‘œํ˜„์œผ๋กœ ๋‚˜ํƒ€๋‚ด ์ถ”๊ฐ€ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ด๋ฃจ์—ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์˜๋ฃŒ ์ฝ”๋“œ์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์ด ์ค‘์š”ํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ด์–ด์ง€๋Š” ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๊ฐ€ ๊ฐ•ํ™”๋  ์ˆ˜ ์žˆ๋„๋ก ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ์˜๋ฃŒ ์ฝ”๋“œ์˜ ๋ถ„์‚ฐ ํ‘œํ˜„ ๊ฐ„์˜ ์œ ์‚ฌ๋„์™€ ํ†ต๊ณ„์  ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์ถ•ํ•˜์˜€๊ณ  ๊ทธ๋ž˜ํ”„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉ, ์‹œ๊ฐ„/ํ†ต๊ณ„์  ์ •๋ณด๊ฐ€ ๊ฐ•ํ™”๋œ ์˜๋ฃŒ ์ฝ”๋“œ์˜ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ์–ป์—ˆ๋‹ค. ํš๋“ํ•œ ์˜๋ฃŒ ์ฝ”๋“œ ๋ฒกํ„ฐ๋ฅผ ํ†ตํ•ด ์‹œํŒ ์•ฝ๋ฌผ์˜ ์ž ์žฌ์ ์ธ ๋ถ€์ž‘์šฉ ์‹ ํ˜ธ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ ๊ฒฐ๊ณผ, ๊ธฐ์กด์˜ ๋ถ€์ž‘์šฉ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์‚ฌ๋ก€๊นŒ์ง€๋„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ„๋Ÿ‰์— ๋น„ํ•ด ์ฃผ์š” ์ •๋ณด๊ฐ€ ํฌ์†Œํ•˜๋‹ค๋Š” ์˜๋ฃŒ ๊ธฐ๋ก์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ง€์‹๊ทธ๋ž˜ํ”„๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์ „ ์˜ํ•™ ์ง€์‹์„ ๋ณด๊ฐ•ํ•˜์˜€๋‹ค. ์ด๋•Œ ํ™˜์ž์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ๊ตฌ์„ฑํ•˜๋Š” ์ง€์‹๊ทธ๋ž˜ํ”„์˜ ๋ถ€๋ถ„๋งŒ์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐœ์ธํ™”๋œ ์ง€์‹๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๊ณ  ๊ทธ๋ž˜ํ”„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ๊ทธ๋ž˜ํ”„์˜ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ํš๋“ํ•˜์˜€๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์ˆœ์ฐจ์ ์ธ ์˜๋ฃŒ ๊ธฐ๋ก์„ ํ•จ์ถ•ํ•œ ํ™˜์ž ํ‘œํ˜„๊ณผ ๋”๋ถˆ์–ด ๊ฐœ์ธํ™”๋œ ์˜ํ•™ ์ง€์‹์„ ํ•จ์ถ•ํ•œ ํ‘œํ˜„์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ํ–ฅํ›„ ์งˆ๋ณ‘ ๋ฐ ์ง„๋‹จ ์˜ˆ์ธก ๋ฌธ์ œ์— ํ™œ์šฉํ•˜์˜€๋‹ค.This dissertation proposes a deep neural network-based medical concept and patient representation learning methods using medical claims data to solve two healthcare tasks, i.e., clinical outcome prediction and post-marketing adverse drug reaction (ADR) signal detection. First, we propose SAF-RNN, a Recurrent Neural Network (RNN)-based model that learns a deep patient representation based on the clinical sequences and patient characteristics. Our proposed model fuses different types of patient records using feature-based gating and self-attention. We demonstrate that high-level associations between two heterogeneous records are effectively extracted by our model, thus achieving state-of-the-art performances for predicting the risk probability of cardiovascular disease. Secondly, based on the observation that the distributed medical code embeddings represent temporal proximity between the medical codes, we introduce a graph structure to enhance the code embeddings with such temporal information. We construct a graph using the distributed code embeddings and the statistical information from the claims data. We then propose the Graph Neural Network(GNN)-based representation learning for post-marketing ADR detection. Our model shows competitive performances and provides valid ADR candidates. Finally, rather than using patient records alone, we utilize a knowledge graph to augment the patient representation with prior medical knowledge. Using SAF-RNN and GNN, the deep patient representation is learned from the clinical sequences and the personalized medical knowledge. It is then used to predict clinical outcomes, i.e., next diagnosis prediction and CVD risk prediction, resulting in state-of-the-art performances.1 Introduction 1 2 Background 8 2.1 Medical Concept Embedding 8 2.2 Encoding Sequential Information in Clinical Records 11 3 Deep Patient Representation with Heterogeneous Information 14 3.1 Related Work 16 3.2 Problem Statement 19 3.3 Method 20 3.3.1 RNN-based Disease Prediction Model 20 3.3.2 Self-Attentive Fusion (SAF) Encoder 23 3.4 Dataset and Experimental Setup 24 3.4.1 Dataset 24 3.4.2 Experimental Design 26 ii 3.4.3 Implementation Details 27 3.5 Experimental Results 28 3.5.1 Evaluation of CVD Prediction 28 3.5.2 Sensitivity Analysis 28 3.5.3 Ablation Studies 31 3.6 Further Investigation 32 3.6.1 Case Study: Patient-Centered Analysis 32 3.6.2 Data-Driven CVD Risk Factors 32 3.7 Conclusion 33 4 Graph-Enhanced Medical Concept Embedding 40 4.1 Related Work 42 4.2 Problem Statement 43 4.3 Method 44 4.3.1 Code Embedding Learning with Skip-gram Model 44 4.3.2 Drug-disease Graph Construction 45 4.3.3 A GNN-based Method for Learning Graph Structure 47 4.4 Dataset and Experimental Setup 49 4.4.1 Dataset 49 4.4.2 Experimental Design 50 4.4.3 Implementation Details 52 4.5 Experimental Results 53 4.5.1 Evaluation of ADR Detection 53 4.5.2 Newly-Described ADR Candidates 54 4.6 Conclusion 55 5 Knowledge-Augmented Deep Patient Representation 57 5.1 Related Work 60 5.1.1 Incorporating Prior Medical Knowledge for Clinical Outcome Prediction 60 5.1.2 Inductive KGC based on Subgraph Learning 61 5.2 Method 61 5.2.1 Extracting Personalized KG 61 5.2.2 KA-SAF: Knowledge-Augmented Self-Attentive Fusion Encoder 64 5.2.3 KGC as a Pre-training Task 68 5.2.4 Subgraph Infomax: SGI 69 5.3 Dataset and Experimental Setup 72 5.3.1 Clinical Outcome Prediction 72 5.3.2 Next Diagnosis Prediction 72 5.4 Experimental Results 73 5.4.1 Cardiovascular Disease Prediction 73 5.4.2 Next Diagnosis Prediction 73 5.4.3 KGC on SemMed KG 73 5.5 Conclusion 74 6 Conclusion 77 Abstract (In Korean) 90 Acknowlegement 92๋ฐ•

    Navigating Healthcare Insights: A Birds Eye View of Explainability with Knowledge Graphs

    Full text link
    Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.Comment: IEEE AIKE 2023, 8 Page

    A Neural Attention Model for Categorizing Patient Safety Events

    Full text link
    Medical errors are leading causes of death in the US and as such, prevention of these errors is paramount to promoting health care. Patient Safety Event reports are narratives describing potential adverse events to the patients and are important in identifying and preventing medical errors. We present a neural network architecture for identifying the type of safety events which is the first step in understanding these narratives. Our proposed model is based on a soft neural attention model to improve the effectiveness of encoding long sequences. Empirical results on two large-scale real-world datasets of patient safety reports demonstrate the effectiveness of our method with significant improvements over existing methods.Comment: ECIR 201

    Adverse drug reaction extraction on electronic health records written in Spanish

    Get PDF
    148 p.This work focuses on the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic HealthRecords (EHRs). That is, extracting a response to a medicine which is noxious and unintended and whichoccurs at doses normally used. From Natural Language Processing (NLP) perspective, this wasapproached as a relation extraction task in which the drug is the causative agent of a disease, sign orsymptom, that is, the adverse reaction.ADR extraction from EHRs involves major challenges. First, ADRs are rare events. That is, relationsbetween drugs and diseases found in an EHR are seldom ADRs (are often unrelated or, instead, related astreatment). This implies the inference from samples with skewed class distribution. Second, EHRs arewritten by experts often under time pressure, employing both rich medical jargon together with colloquialexpressions (not always grammatical) and it is not infrequent to find misspells and both standard andnon-standard abbreviations. All this leads to a high lexical variability.We explored several ADR detection algorithms and representations to characterize the ADR candidates.In addition, we have assessed the tolerance of the ADR detection model to external noise such as theincorrect detection of implied medical entities implied in the ADR extraction, i.e. drugs and diseases. Westtled the first steps on ADR extraction in Spanish using a corpus of real EHRs
    • โ€ฆ
    corecore