6,755 research outputs found

    People on Drugs: Credibility of User Statements in Health Communities

    Full text link
    Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information

    MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

    Full text link
    Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present MR-GNN, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (L-STMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.Comment: Accepted by IJCAI 201

    DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks

    Full text link
    Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.Comment: Source code available for Apache Giraph on Hadoo

    μ•½λ¬Ό κ°μ‹œλ₯Ό μœ„ν•œ λΉ„μ •ν˜• ν…μŠ€νŠΈ λ‚΄ μž„μƒ 정보 μΆ”μΆœ 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : μœ΅ν•©κ³Όν•™κΈ°μˆ λŒ€ν•™μ› μ‘μš©λ°”μ΄μ˜€κ³΅ν•™κ³Ό, 2023. 2. μ΄ν˜•κΈ°.Pharmacovigilance is a scientific activity to detect, evaluate and understand the occurrence of adverse drug events or other problems related to drug safety. However, concerns have been raised over the quality of drug safety information for pharmacovigilance, and there is also a need to secure a new data source to acquire drug safety information. On the other hand, the rise of pre-trained language models based on a transformer architecture has accelerated the application of natural language processing (NLP) techniques in diverse domains. In this context, I tried to define two problems in pharmacovigilance as an NLP task and provide baseline models for the defined tasks: 1) extracting comprehensive drug safety information from adverse drug events narratives reported through a spontaneous reporting system (SRS) and 2) extracting drug-food interaction information from abstracts of biomedical articles. I developed annotation guidelines and performed manual annotation, demonstrating that strong NLP models can be trained to extracted clinical information from unstructrued free-texts by fine-tuning transformer-based language models on a high-quality annotated corpus. Finally, I discuss issues to consider when when developing annotation guidelines for extracting clinical information related to pharmacovigilance. The annotated corpora and the NLP models in this dissertation can streamline pharmacovigilance activities by enhancing the data quality of reported drug safety information and expanding the data sources.μ•½λ¬Ό κ°μ‹œλŠ” μ•½λ¬Ό λΆ€μž‘μš© λ˜λŠ” μ•½λ¬Ό μ•ˆμ „μ„±κ³Ό κ΄€λ ¨λœ 문제의 λ°œμƒμ„ 감지, 평가 및 μ΄ν•΄ν•˜κΈ° μœ„ν•œ 과학적 ν™œλ™μ΄λ‹€. κ·ΈλŸ¬λ‚˜ μ•½λ¬Ό κ°μ‹œμ— μ‚¬μš©λ˜λŠ” μ˜μ•½ν’ˆ μ•ˆμ „μ„± μ •λ³΄μ˜ 보고 ν’ˆμ§ˆμ— λŒ€ν•œ μš°λ €κ°€ κΎΈμ€€νžˆ μ œκΈ°λ˜μ—ˆμœΌλ©°, ν•΄λ‹Ή 보고 ν’ˆμ§ˆμ„ 높이기 μœ„ν•΄μ„œλŠ” μ•ˆμ „μ„± 정보λ₯Ό 확보할 μƒˆλ‘œμš΄ μžλ£Œμ›μ΄ ν•„μš”ν•˜λ‹€. ν•œνŽΈ 트랜슀포머 μ•„ν‚€ν…μ²˜λ₯Ό 기반으둜 μ‚¬μ „ν›ˆλ ¨ μ–Έμ–΄λͺ¨λΈμ΄ λ“±μž₯ν•˜λ©΄μ„œ λ‹€μ–‘ν•œ λ„λ©”μΈμ—μ„œ μžμ—°μ–΄μ²˜λ¦¬ 기술 적용이 κ°€μ†ν™”λ˜μ—ˆλ‹€. μ΄λŸ¬ν•œ λ§₯λ½μ—μ„œ λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” μ•½λ¬Ό κ°μ‹œλ₯Ό μœ„ν•œ λ‹€μŒ 2가지 정보 μΆ”μΆœ 문제λ₯Ό μžμ—°μ–΄μ²˜λ¦¬ 문제 ν˜•νƒœλ‘œ μ •μ˜ν•˜κ³  κ΄€λ ¨ κΈ°μ€€ λͺ¨λΈμ„ κ°œλ°œν•˜μ˜€λ‹€: 1) μˆ˜λ™μ  μ•½λ¬Ό κ°μ‹œ 체계에 보고된 이상사둀 μ„œμˆ μžλ£Œμ—μ„œ 포괄적인 μ•½λ¬Ό μ•ˆμ „μ„± 정보λ₯Ό μΆ”μΆœν•œλ‹€. 2) 영문 μ˜μ•½ν•™ λ…Όλ¬Έ μ΄ˆλ‘μ—μ„œ μ•½λ¬Ό-μ‹ν’ˆ μƒν˜Έμž‘μš© 정보λ₯Ό μΆ”μΆœν•œλ‹€. 이λ₯Ό μœ„ν•΄ μ•ˆμ „μ„± 정보 μΆ”μΆœμ„ μœ„ν•œ μ–΄λ…Έν…Œμ΄μ…˜ κ°€μ΄λ“œλΌμΈμ„ κ°œλ°œν•˜κ³  μˆ˜μž‘μ—…μœΌλ‘œ μ–΄λ…Έν…Œμ΄μ…˜μ„ μˆ˜ν–‰ν•˜μ˜€λ‹€. 결과적으둜 κ³ ν’ˆμ§ˆμ˜ μžμ—°μ–΄ ν•™μŠ΅λ°μ΄ν„°λ₯Ό 기반으둜 μ‚¬μ „ν•™μŠ΅ μ–Έμ–΄λͺ¨λΈμ„ λ―Έμ„Έ μ‘°μ •ν•¨μœΌλ‘œμ¨ λΉ„μ •ν˜• ν…μŠ€νŠΈμ—μ„œ μž„μƒ 정보λ₯Ό μΆ”μΆœν•˜λŠ” κ°•λ ₯ν•œ μžμ—°μ–΄μ²˜λ¦¬ λͺ¨λΈ 개발이 κ°€λŠ₯함을 ν™•μΈν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” μ•½λ¬Όκ°μ‹œμ™€ κ΄€λ ¨λœμž„μƒ 정보 μΆ”μΆœμ„ μœ„ν•œ μ–΄λ…Έν…Œμ΄μ…˜ κ°€μ΄λ“œλΌμΈμ„ κ°œλ°œν•  λ•Œ κ³ λ €ν•΄μ•Ό ν•  주의 사항에 λŒ€ν•΄ λ…Όμ˜ν•˜μ˜€λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œ μ†Œκ°œν•œ μžμ—°μ–΄ ν•™μŠ΅λ°μ΄ν„°μ™€ μžμ—°μ–΄μ²˜λ¦¬ λͺ¨λΈμ€ μ•½λ¬Ό μ•ˆμ „μ„± μ •λ³΄μ˜ 보고 ν’ˆμ§ˆμ„ ν–₯μƒμ‹œν‚€κ³  μžλ£Œμ›μ„ ν™•μž₯ν•˜μ—¬ μ•½λ¬Ό κ°μ‹œ ν™œλ™μ„ 보쑰할 κ²ƒμœΌλ‘œ κΈ°λŒ€λœλ‹€.Chapter 1 1 1.1 Contributions of this dissertation 2 1.2 Overview of this dissertation 2 1.3 Other works 3 Chapter 2 4 2.1 Pharmacovigilance 4 2.2 Biomedical NLP for pharmacovigilance 6 2.2.1 Pre-trained language models 6 2.2.2 Corpora to extract clinical information for pharmacovigilance 9 Chapter 3 11 3.1 Motivation 12 3.2 Proposed Methods 14 3.2.1 Data source and text corpus 15 3.2.2 Annotation of ADE narratives 16 3.2.3 Quality control of annotation 17 3.2.4 Pretraining KAERS-BERT 18 3.2.6 Named entity recognition 20 3.2.7 Entity label classification and sentence extraction 21 3.2.8 Relation extraction 21 3.2.9 Model evaluation 22 3.2.10 Ablation experiment 23 3.3 Results 24 3.3.1 Annotated ICSRs 24 3.3.2 Corpus statistics 26 3.3.3 Performance of NLP models to extract drug safety information 28 3.3.4 Ablation experiment 31 3.4 Discussion 33 3.5 Conclusion 38 Chapter 4 39 4.1 Motivation 39 4.2 Proposed Methods 43 4.2.1 Data source 44 4.2.2 Annotation 45 4.2.3 Quality control of annotation 49 4.2.4 Baseline model development 49 4.3 Results 50 4.3.1 Corpus statistics 50 4.3.2 Annotation Quality 54 4.3.3 Performance of baseline models 55 4.3.4 Qualitative error analysis 56 4.4 Discussion 59 4.5 Conclusion 63 Chapter 5 64 5.1 Issues around defining a word entity 64 5.2 Issues around defining a relation between word entities 66 5.3 Issues around defining entity labels 68 5.4 Issues around selecting and preprocessing annotated documents 68 Chapter 6 71 6.1 Dissertation summary 71 6.2 Limitation and future works 72 6.2.1 Development of end-to-end information extraction models from free-texts to database based on existing structured information 72 6.2.2 Application of in-context learning framework in clinical information extraction 74 Chapter 7 76 7.1 Annotation Guideline for "Extraction of Comprehensive Drug Safety Information from Adverse Event Narratives Reported through Spontaneous Reporting System" 76 7.2 Annotation Guideline for "Extraction of Drug-Food Interactions from the Abtracts of Biomedical Articles" 100λ°•
    • …
    corecore