5,520 research outputs found

    Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

    Full text link
    Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas

    An annotated corpus with nanomedicine and pharmacokinetic parameters

    Get PDF
    A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administrationโ€™s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided

    ์•ฝ๋ฌผ ๊ฐ์‹œ๋ฅผ ์œ„ํ•œ ๋น„์ •ํ˜• ํ…์ŠคํŠธ ๋‚ด ์ž„์ƒ ์ •๋ณด ์ถ”์ถœ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์‘์šฉ๋ฐ”์ด์˜ค๊ณตํ•™๊ณผ, 2023. 2. ์ดํ˜•๊ธฐ.Pharmacovigilance is a scientific activity to detect, evaluate and understand the occurrence of adverse drug events or other problems related to drug safety. However, concerns have been raised over the quality of drug safety information for pharmacovigilance, and there is also a need to secure a new data source to acquire drug safety information. On the other hand, the rise of pre-trained language models based on a transformer architecture has accelerated the application of natural language processing (NLP) techniques in diverse domains. In this context, I tried to define two problems in pharmacovigilance as an NLP task and provide baseline models for the defined tasks: 1) extracting comprehensive drug safety information from adverse drug events narratives reported through a spontaneous reporting system (SRS) and 2) extracting drug-food interaction information from abstracts of biomedical articles. I developed annotation guidelines and performed manual annotation, demonstrating that strong NLP models can be trained to extracted clinical information from unstructrued free-texts by fine-tuning transformer-based language models on a high-quality annotated corpus. Finally, I discuss issues to consider when when developing annotation guidelines for extracting clinical information related to pharmacovigilance. The annotated corpora and the NLP models in this dissertation can streamline pharmacovigilance activities by enhancing the data quality of reported drug safety information and expanding the data sources.์•ฝ๋ฌผ ๊ฐ์‹œ๋Š” ์•ฝ๋ฌผ ๋ถ€์ž‘์šฉ ๋˜๋Š” ์•ฝ๋ฌผ ์•ˆ์ „์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์ œ์˜ ๋ฐœ์ƒ์„ ๊ฐ์ง€, ํ‰๊ฐ€ ๋ฐ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๊ณผํ•™์  ํ™œ๋™์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•ฝ๋ฌผ ๊ฐ์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ์˜์•ฝํ’ˆ ์•ˆ์ „์„ฑ ์ •๋ณด์˜ ๋ณด๊ณ  ํ’ˆ์งˆ์— ๋Œ€ํ•œ ์šฐ๋ ค๊ฐ€ ๊พธ์ค€ํžˆ ์ œ๊ธฐ๋˜์—ˆ์œผ๋ฉฐ, ํ•ด๋‹น ๋ณด๊ณ  ํ’ˆ์งˆ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•ˆ์ „์„ฑ ์ •๋ณด๋ฅผ ํ™•๋ณดํ•  ์ƒˆ๋กœ์šด ์ž๋ฃŒ์›์ด ํ•„์š”ํ•˜๋‹ค. ํ•œํŽธ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์ „ํ›ˆ๋ จ ์–ธ์–ด๋ชจ๋ธ์ด ๋“ฑ์žฅํ•˜๋ฉด์„œ ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์—์„œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๊ธฐ์ˆ  ์ ์šฉ์ด ๊ฐ€์†ํ™”๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งฅ๋ฝ์—์„œ ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์•ฝ๋ฌผ ๊ฐ์‹œ๋ฅผ ์œ„ํ•œ ๋‹ค์Œ 2๊ฐ€์ง€ ์ •๋ณด ์ถ”์ถœ ๋ฌธ์ œ๋ฅผ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฌธ์ œ ํ˜•ํƒœ๋กœ ์ •์˜ํ•˜๊ณ  ๊ด€๋ จ ๊ธฐ์ค€ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค: 1) ์ˆ˜๋™์  ์•ฝ๋ฌผ ๊ฐ์‹œ ์ฒด๊ณ„์— ๋ณด๊ณ ๋œ ์ด์ƒ์‚ฌ๋ก€ ์„œ์ˆ ์ž๋ฃŒ์—์„œ ํฌ๊ด„์ ์ธ ์•ฝ๋ฌผ ์•ˆ์ „์„ฑ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. 2) ์˜๋ฌธ ์˜์•ฝํ•™ ๋…ผ๋ฌธ ์ดˆ๋ก์—์„œ ์•ฝ๋ฌผ-์‹ํ’ˆ ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์•ˆ์ „์„ฑ ์ •๋ณด ์ถ”์ถœ์„ ์œ„ํ•œ ์–ด๋…ธํ…Œ์ด์…˜ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์ˆ˜์ž‘์—…์œผ๋กœ ์–ด๋…ธํ…Œ์ด์…˜์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ณ ํ’ˆ์งˆ์˜ ์ž์—ฐ์–ด ํ•™์Šต๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•จ์œผ๋กœ์จ ๋น„์ •ํ˜• ํ…์ŠคํŠธ์—์„œ ์ž„์ƒ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ชจ๋ธ ๊ฐœ๋ฐœ์ด ๊ฐ€๋Šฅํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์•ฝ๋ฌผ๊ฐ์‹œ์™€ ๊ด€๋ จ๋œ์ž„์ƒ ์ •๋ณด ์ถ”์ถœ์„ ์œ„ํ•œ ์–ด๋…ธํ…Œ์ด์…˜ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ๊ฐœ๋ฐœํ•  ๋•Œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์ฃผ์˜ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜์˜€๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•œ ์ž์—ฐ์–ด ํ•™์Šต๋ฐ์ดํ„ฐ์™€ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ชจ๋ธ์€ ์•ฝ๋ฌผ ์•ˆ์ „์„ฑ ์ •๋ณด์˜ ๋ณด๊ณ  ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์ž๋ฃŒ์›์„ ํ™•์žฅํ•˜์—ฌ ์•ฝ๋ฌผ ๊ฐ์‹œ ํ™œ๋™์„ ๋ณด์กฐํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Chapter 1 1 1.1 Contributions of this dissertation 2 1.2 Overview of this dissertation 2 1.3 Other works 3 Chapter 2 4 2.1 Pharmacovigilance 4 2.2 Biomedical NLP for pharmacovigilance 6 2.2.1 Pre-trained language models 6 2.2.2 Corpora to extract clinical information for pharmacovigilance 9 Chapter 3 11 3.1 Motivation 12 3.2 Proposed Methods 14 3.2.1 Data source and text corpus 15 3.2.2 Annotation of ADE narratives 16 3.2.3 Quality control of annotation 17 3.2.4 Pretraining KAERS-BERT 18 3.2.6 Named entity recognition 20 3.2.7 Entity label classification and sentence extraction 21 3.2.8 Relation extraction 21 3.2.9 Model evaluation 22 3.2.10 Ablation experiment 23 3.3 Results 24 3.3.1 Annotated ICSRs 24 3.3.2 Corpus statistics 26 3.3.3 Performance of NLP models to extract drug safety information 28 3.3.4 Ablation experiment 31 3.4 Discussion 33 3.5 Conclusion 38 Chapter 4 39 4.1 Motivation 39 4.2 Proposed Methods 43 4.2.1 Data source 44 4.2.2 Annotation 45 4.2.3 Quality control of annotation 49 4.2.4 Baseline model development 49 4.3 Results 50 4.3.1 Corpus statistics 50 4.3.2 Annotation Quality 54 4.3.3 Performance of baseline models 55 4.3.4 Qualitative error analysis 56 4.4 Discussion 59 4.5 Conclusion 63 Chapter 5 64 5.1 Issues around defining a word entity 64 5.2 Issues around defining a relation between word entities 66 5.3 Issues around defining entity labels 68 5.4 Issues around selecting and preprocessing annotated documents 68 Chapter 6 71 6.1 Dissertation summary 71 6.2 Limitation and future works 72 6.2.1 Development of end-to-end information extraction models from free-texts to database based on existing structured information 72 6.2.2 Application of in-context learning framework in clinical information extraction 74 Chapter 7 76 7.1 Annotation Guideline for "Extraction of Comprehensive Drug Safety Information from Adverse Event Narratives Reported through Spontaneous Reporting System" 76 7.2 Annotation Guideline for "Extraction of Drug-Food Interactions from the Abtracts of Biomedical Articles" 100๋ฐ•

    Information Extraction from Biomedical Text Using Machine Learning

    Get PDF
    Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, Machine Learning algorithms have been employed to build baseline binary classification models to identify pediatric text in unstructured drug labels. Third, a series of experiments have been executed to evaluate the accuracy of the model. The prototype is able to classify pediatrics-related text with a recall of 0.93 and precision of 0.86

    DrugExBERT for Pharmacovigilance โ€“ A Novel Approach for Detecting Drug Experiences from User-Generated Content

    Get PDF
    Pharmaceutical companies have to maintain drug safety through pharmacovigilance systems by monitoring various sources of information about adverse drug experiences. Recently, user-generated content (UGC) has emerged as a valuable source of real-world drug experiences, posing new challenges due to its high volume and variety. We present DrugExBERT, a novel approach to extract adverse drug experiences (adverse reaction, lack of effect) and supportive drug experiences (effectiveness, intervention, indication, and off-label use) from UGC. To be able to verify the extracted drug experiences, DrugExBERT additionally provides explications in the form of UGC phrases that were critical for the extraction. In our evaluation, we demonstrate that DrugExBERT outperforms state-of-the-art pharmacovigilance approaches as well as ChatGPT on several performance measures and that DrugExBERT is data- and drug-agnostic. Thus, our novel approach can help pharmaceutical companies meet their legal obligations and ethical responsibility while ensuring patient safety and monitoring drug effectiveness

    Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers

    Get PDF
    Monitoring drug safety is a central concern throughout the drug life cycle. Information about toxicity and adverse events is generated at every stage of this life cycle, and stakeholders have a strong interest in applying text mining and artificial intelligence (AI) methods to manage the ever-increasing volume of this information. Recognizing the importance of these applications and the role of challenge evaluations to drive progress in text mining, the organizers of BioCreative VII (Critical Assessment of Information Extraction in Biology) convened a panel of experts to explore โ€˜Challenges in Mining Drug Adverse Reactionsโ€™. This article is an outgrowth of the panel; each panelist has highlighted specific text mining application(s), based on their research and their experiences in organizing text mining challenge evaluations. While these highlighted applications only sample the complexity of this problem space, they reveal both opportunities and challenges for text mining to aid in the complex process of drug discovery, testing, marketing and post-market surveillance. Stakeholders are eager to embrace natural language processing and AI tools to help in this process, provided that these tools can be demonstrated to add value to stakeholder workflows. This creates an opportunity for the BioCreative community to work in partnership with regulatory agencies, pharma and the text mining community to identify next steps for future challenge evaluations.M.K.: This work was supported in part through the collaboration between the Spanish Plan for the Advancement of Language Technology (Plan TL) and the Barcelona Supercomputing Center; we also acknowledge the 2020 Proyectos de I+D+i - RTI Tipo A (PID2020-119266RA-I00) for support. ร–.U.: This study was supported in part by the National Library of Medicine under Award Number R15LM013209 and R13LM013127.Peer ReviewedPostprint (published version

    Identifying Potential Adverse Effects Using the Web: A New Approach to Medical Hypothesis Generation

    Get PDF
    Medical message boards are online resources where users with a particular condition exchange information, some of which they might not otherwise share with medical providers. Many of these boards contain a large number of posts and contain patient opinions and experiences that would be potentially useful to clinicians and researchers. We present an approach that is able to collect a corpus of medical message board posts, de-identify the corpus, and extract information on potential adverse drug effects discussed by users. Using a corpus of posts to breast cancer message boards, we identified drug event pairs using co-occurrence statistics. We then compared the identified drug event pairs with adverse effects listed on the package labels of tamoxifen, anastrozole, exemestane, and letrozole. Of the pairs identified by our system, 75โ€“80% were documented on the drug labels. Some of the undocumented pairs may represent previously unidentified adverse drug effects
    • โ€ฆ
    corecore