3,223 research outputs found

    Neural Relation Extraction Within and Across Sentence Boundaries

    Full text link
    Past work in relation extraction mostly focuses on binary relation between entity pairs within single sentence. Recently, the NLP community has gained interest in relation extraction in entity pairs spanning multiple sentences. In this paper, we propose a novel architecture for this task: inter-sentential dependency-based neural networks (iDepNN). iDepNN models the shortest and augmented dependency paths via recurrent and recursive neural networks to extract relationships within (intra-) and across (inter-) sentence boundaries. Compared to SVM and neural network baselines, iDepNN is more robust to false positives in relationships spanning sentences. We evaluate our models on four datasets from newswire (MUC6) and medical (BioNLP shared task) domains that achieve state-of-the-art performance and show a better balance in precision and recall for inter-sentential relationships. We perform better than 11 teams participating in the BioNLP shared task 2016 and achieve a gain of 5.2% (0.587 vs 0.558) in F1 over the winning team. We also release the crosssentence annotations for MUC6.Comment: AAAI201

    Adverse drug reaction extraction on electronic health records written in Spanish

    Get PDF
    148 p.This work focuses on the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic HealthRecords (EHRs). That is, extracting a response to a medicine which is noxious and unintended and whichoccurs at doses normally used. From Natural Language Processing (NLP) perspective, this wasapproached as a relation extraction task in which the drug is the causative agent of a disease, sign orsymptom, that is, the adverse reaction.ADR extraction from EHRs involves major challenges. First, ADRs are rare events. That is, relationsbetween drugs and diseases found in an EHR are seldom ADRs (are often unrelated or, instead, related astreatment). This implies the inference from samples with skewed class distribution. Second, EHRs arewritten by experts often under time pressure, employing both rich medical jargon together with colloquialexpressions (not always grammatical) and it is not infrequent to find misspells and both standard andnon-standard abbreviations. All this leads to a high lexical variability.We explored several ADR detection algorithms and representations to characterize the ADR candidates.In addition, we have assessed the tolerance of the ADR detection model to external noise such as theincorrect detection of implied medical entities implied in the ADR extraction, i.e. drugs and diseases. Westtled the first steps on ADR extraction in Spanish using a corpus of real EHRs
    corecore