Search CORE

490 research outputs found

Recommended from our members

BADREX: In situ expansion and coreference of biomedical abbreviations using dynamic regular expressions

Author: Gooch P.
Publication venue: City University London
Publication date
Field of study

BADREX uses dynamically generated regular expressions to annotate term definition–term abbreviation pairs, and corefers unpaired acronyms and abbreviations back to their initial definition in the text. Against the Medstract corpus BADREX achieves precision and recall of 98% and 97%, and against a much larger corpus, 90% and 85%, respectively. BADREX yields improved performance over previous approaches, requires no training data and allows runtime customisation of its input parameters. BADREX is freely available from https://github.com/philgooch/BADREX-Biomedical-Abbreviation- Expander as a plugin for the General Architecture for Text Engineering (GATE) framework and is licensed under the GPLv3

City Research Online

Vagueness and referential ambiguity in a large-scale annotated corpus

Author: Versley Yannick
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Coreference based event-argument relation extraction on biomedical text

Author: Asahara Masayuki
Hirao Tsutomu
Matsumoto Yuji
Riedel Sebastian
Yoshikawa Katsumasa
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

This paper presents a new approach to exploit coreference information for extracting event-argument (E-A) relations from biomedical documents. This approach has two advantages: (1) it can extract a large number of valuable E-A relations based on the concept of salience in discourse; (2) it enables us to identify E-A relations over sentence boundaries (cross-links) using transitivity of coreference relations. We propose two coreference-based models: a pipeline based on Support Vector Machine (SVM) classifiers, and a joint Markov Logic Network (MLN). We show the effectiveness of these models on a biomedical event corpus. Both models outperform the systems that do not use coreference information. When the two proposed models are compared to each other, joint MLN outperforms pipeline SVM with gold coreference information

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

A Mention-Ranking Model for Abstract Anaphora Resolution

Author: Born Leo
Frank Anette
Marasović Ana
Opitz Juri
Publication venue
Publication date: 01/01/2017
Field of study

Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.Comment: In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmar

arXiv.org e-Print Archive

TUbiblio

Crossref