181 research outputs found

    Biomedical Event Trigger Identification Using Bidirectional Recurrent Neural Network Based Models

    Full text link
    Biomedical events describe complex interactions between various biomedical entities. Event trigger is a word or a phrase which typically signifies the occurrence of an event. Event trigger identification is an important first step in all event extraction methods. However many of the current approaches either rely on complex hand-crafted features or consider features only within a window. In this paper we propose a method that takes the advantage of recurrent neural network (RNN) to extract higher level features present across the sentence. Thus hidden state representation of RNN along with word and entity type embedding as features avoid relying on the complex hand-crafted features generated using various NLP toolkits. Our experiments have shown to achieve state-of-art F1-score on Multi Level Event Extraction (MLEE) corpus. We have also performed category-wise analysis of the result and discussed the importance of various features in trigger identification task.Comment: The work has been accepted in BioNLP at ACL-201

    Extracting Drug-Drug Interactions with Character-Level and Dependency-Based Embeddings

    Get PDF
    The DDI track of TAC-2018 challenge addresses the problem of an information retrieval of drug-drug interactions on structured product labeling documents with discontinuous and overlapping entities. In this paper, we present our participation for event extraction subtask (Task 1). We used a supervised long-short-term memory (LSTM) network with conditional random fields decoding (LSTM-CRF) approach with an automatic exploring of words and characters features. Additional dependency-based information was integrated into word embeddings to allow better word representation. Our system performed with above median score

    Adverse drug reaction extraction on electronic health records written in Spanish

    Get PDF
    148 p.This work focuses on the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic HealthRecords (EHRs). That is, extracting a response to a medicine which is noxious and unintended and whichoccurs at doses normally used. From Natural Language Processing (NLP) perspective, this wasapproached as a relation extraction task in which the drug is the causative agent of a disease, sign orsymptom, that is, the adverse reaction.ADR extraction from EHRs involves major challenges. First, ADRs are rare events. That is, relationsbetween drugs and diseases found in an EHR are seldom ADRs (are often unrelated or, instead, related astreatment). This implies the inference from samples with skewed class distribution. Second, EHRs arewritten by experts often under time pressure, employing both rich medical jargon together with colloquialexpressions (not always grammatical) and it is not infrequent to find misspells and both standard andnon-standard abbreviations. All this leads to a high lexical variability.We explored several ADR detection algorithms and representations to characterize the ADR candidates.In addition, we have assessed the tolerance of the ADR detection model to external noise such as theincorrect detection of implied medical entities implied in the ADR extraction, i.e. drugs and diseases. Westtled the first steps on ADR extraction in Spanish using a corpus of real EHRs

    Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

    Full text link
    In formal procedure of civil cases, the textual materials provided by different parties describe the development process of the cases. It is a difficult but necessary task to extract the key information for the cases from these textual materials and to clarify the dispute focus of related parties. Currently, officers read the materials manually and use methods, such as keyword searching and regular matching, to get the target information. These approaches are time-consuming and heavily depending on prior knowledge and carefulness of the officers. To assist the officers to enhance working efficiency and accuracy, we propose an approach to detect disputes from divorce cases based on a two-round-labeling event extracting technique in this paper. We implement the Judicial Intelligent Assistant (JIA) system according to the proposed approach to 1) automatically extract focus events from divorce case materials, 2) align events by identifying co-reference among them, and 3) detect conflicts among events brought by the plaintiff and the defendant. With the JIA system, it is convenient for judges to determine the disputed issues. Experimental results demonstrate that the proposed approach and system can obtain the focus of cases and detect conflicts more effectively and efficiently comparing with existing method.Comment: 20 page

    Clinical Text Classification with Rule-based Features and Knowledge-guided Convolutional Neural Networks

    Full text link
    Clinical text classification is an important problem in medical natural language processing. Existing studies have conventionally focused on rules or knowledge sources-based feature engineering, but only a few have exploited effective feature learning capability of deep learning methods. In this study, we propose a novel approach which combines rule-based features and knowledge-guided deep learning techniques for effective disease classification. Critical Steps of our method include identifying trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network with word embeddings and Unified Medical Language System (UMLS) entity embeddings. We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results show that our method outperforms the state of the art methods.Comment: arXiv admin note: text overlap with arXiv:1806.04820 by other author

    Optimizing text mining methods for improving biomedical natural language processing

    Get PDF
    The overwhelming amount and the increasing rate of publication in the biomedical domain make it difficult for life sciences researchers to acquire and maintain all information that is necessary for their research. Pubmed (the primary citation database for the biomedical literature) currently contains over 21 million article abstracts and more than one million of them were published in 2020 alone. Even though existing article databases provide capable keyword search services, typical everyday-life queries usually return thousands of relevant articles. For instance, a cancer research scientist may need to acquire a complete list of genes that interact with BRCA1 (breast cancer 1) gene. The PubMed keyword search for BRCA1 returns over 16,500 article abstracts, making manual inspection of the retrieved documents impractical. Missing even one of the interacting gene partners in this scenario may jeopardize successful development of a potential new drug or vaccine. Although manually curated databases of biomolecular interactions exist, they are usually not up-to-date and they require notable human effort to maintain. To summarize, new discoveries are constantly being shared within the community via scientific publishing, but unfortunately the probability of missing vital information for research in life sciences is increasing. In response to this problem, the biomedical natural language processing (BioNLP) community of researchers has emerged and strives to assist life sciences researchers by building modern language processing and text mining tools that can be applied at large-scale and scan the whole publicly available literature and extract, classify, and aggregate the information found within, thus keeping life sciences researchers always up-to-date with the recent relevant discoveries and facilitating their research in numerous fields such as molecular biology, biomedical engineering, bioinformatics, genetics engineering and biochemistry. My research has almost exclusively focused on biomedical relation and event extraction tasks. These foundational information extraction tasks deal with automatic detection of biological processes, interactions and relations described in the biomedical literature. Precisely speaking, biomedical relation and event extraction systems can scan through a vast amount of biomedical texts and automatically detect and extract the semantic relations of biomedical named entities (e.g. genes, proteins, chemical compounds, and diseases). The structured outputs of such systems (i.e., the extracted relations or events) can be stored as relational databases or molecular interaction networks which can easily be queried, filtered, analyzed, visualized and integrated with other structured data sources. Extracting biomolecular interactions has always been the primary interest of BioNLP researcher because having knowledge about such interactions is crucially important in various research areas including precision medicine, drug discovery, drug repurposing, hypothesis generation, construction and curation of signaling pathways, and protein function and structure prediction. State-of-the-art relation and event extraction methods are based on supervised machine learning, requiring manually annotated data for training. Manual annotation for the biomedical domain requires domain expertise and it is time-consuming. Hence, having minimal training data for building information extraction systems is a common case in the biomedical domain. This demands development of methods that can make the most out of available training data and this thesis gathers all my research efforts and contributions in that direction. It is worth mentioning that biomedical natural language processing has undergone a revolution since I started my research in this field almost ten years ago. As a member of the BioNLP community, I have witnessed the emergence, improvement– and in some cases, the disappearance–of many methods, each pushing the performance of the best previous method one step further. I can broadly divide the last ten years into three periods. Once I started my research, feature-based methods that relied on heavy feature engineering were dominant and popular. Then, significant advancements in the hardware technology, as well as several breakthroughs in the algorithms and methods enabled machine learning practitioners to seriously utilize artificial neural networks for real-world applications. In this period, convolutional, recurrent, and attention-based neural network models became dominant and superior. Finally, the introduction of transformer-based language representation models such as BERT and GPT impacted the field and resulted in unprecedented performance improvements on many data sets. When reading this thesis, I demand the reader to take into account the course of history and judge the methods and results based on what could have been done in that particular period of the history

    Event Extraction: A Survey

    Full text link
    Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection.Comment: 20 page
    • …
    corecore