Search CORE

2,934 research outputs found

Structured Representations for Coreference Resolution

Author: Martschat Sebastian
Publication venue
Publication date: 01/01/2017
Field of study

Coreference resolution is the task of determining which expressions in a text are used to refer to the same entity. This task is one of the most fundamental problems of natural language understanding. Inherently, coreference resolution is a structured task, as the output consists of sets of coreferring expressions. This complex structure poses several challenges since it is not clear how to account for the structure in terms of error analysis and representation. In this thesis, we present a treatment of computational coreference resolution that accounts for the structure. Our treatment encompasses error analysis and the representation of approaches to coreference resolution. In particular, we propose two frameworks in this thesis. The first framework deals with error analysis. We gather requirements for an appropriate error analysis method and devise a framework that considers a structured graph-based representation of the reference annotation and the system output. Error extraction is performed by constructing linguistically motivated or data-driven spanning trees for the graph-based coreference representations. The second framework concerns the representation of approaches to coreference resolution. We show that approaches to coreference resolution can be understood as predictors of latent structures that are not annotated in the data. From these latent structures, the final output is derived during a post-processing step. We devise a machine learning framework for coreference resolution based on this insight. In this framework, we have a unified representation of approaches to coreference resolution. Individual approaches can be expressed as instantiations of a generic approach. We express many approaches from the literature as well as novel variants in our framework, ranging from simple pairwise classification approaches to complex entity-centric models. Using the uniform representation, we are able to analyze differences and similarities between the models transparently and in detail. Finally, we employ the error analysis framework to perform a qualitative analysis of differences in error profiles of the models on a benchmark dataset. We trace back differences in the error profiles to differences in the representation. Our analysis shows that a mention ranking model and a tree-based mention-entity model with left-to-right inference have the highest performance. We discuss reasons for the improved performance and analyze why more advanced approaches modeled in our framework cannot improve on these models. An implementation of the frameworks discussed in this thesis is publicly available

Heidelberger Dokumentenserver

Coreference-Based Summarization and Question Answering: a Case for High Precision Anaphor Resolution

Author: Stuckardt Roland
Publication venue
Publication date: 26/07/2005
Field of study

Approaches to Text Summarization and Question Answering are known to benefit from the availability of coreference information. Based on an analysis of its contributions, a more detailed look at coreference processing for these applications will be proposed: it should be considered as a task of anaphor resolution rather than coreference resolution. It will be further argued that high precision approaches to anaphor resolution optimally match the specific requirements. Three such approaches will be described and empirically evaluated, and the implications for Text Summarization and Question Answering will be discussed

Hochschulschriftenserver - Universität Frankfurt am Main

Use Generalized Representations, But Do Not Forget Surface Features

Author: Moosavi Nafise Sadat
Strube Michael
Publication venue
Publication date: 01/01/2017
Field of study

Only a year ago, all state-of-the-art coreference resolvers were using an extensive amount of surface features. Recently, there was a paradigm shift towards using word embeddings and deep neural networks, where the use of surface features is very limited. In this paper, we show that a simple SVM model with surface features outperforms more complex neural models for detecting anaphoric mentions. Our analysis suggests that using generalized representations and surface features have different strength that should be both taken into account for improving coreference resolution.Comment: CORBON workshop@EACL 201

arXiv.org e-Print Archive

TUbiblio

Crossref

White Rose Research Online

Lexical Features in Coreference Resolution: To be Used With Caution

Author: Moosavi Nafise Sadat
Strube Michael
Publication venue
Publication date: 01/01/2017
Field of study

Lexical features are a major source of information in state-of-the-art coreference resolvers. Lexical features implicitly model some of the linguistic phenomena at a fine granularity level. They are especially useful for representing the context of mentions. In this paper we investigate a drawback of using many lexical features in state-of-the-art coreference resolvers. We show that if coreference resolvers mainly rely on lexical features, they can hardly generalize to unseen domains. Furthermore, we show that the current coreference resolution evaluation is clearly flawed by only evaluating on a specific split of a specific dataset in which there is a notable overlap between the training, development and test sets.Comment: 6 pages, ACL 201

arXiv.org e-Print Archive

TUbiblio

Crossref

White Rose Research Online

Comparing knowledge sources for nominal anaphora resolution

Author: Markert K.
Nissim M.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2005
Field of study

We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora by means of shallow lexico-semantic patterns. As corpora we use the British National Corpus (BNC), as well as the Web, which has not been previously used for this task. Our results show that (a) the knowledge encoded in WordNet is often insufficient, especially for anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite NP coreference, the Web-based method yields results comparable to those obtained using WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the dataset; (d) in both case studies, the BNC-based method is worse than the other methods because of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge gap often encountered in anaphora resolution, and handled examples with context-dependent relations between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

White Rose Research Online

Recommended from our members

Lexical patterns, features and knowledge resources for coreference resolution in clinical notes

Author: Abdul Roudsari
D’Avolio
Miller
Phil Gooch
Rahman
Recasens
Rosse
Savova
Savova
Uzuner
van Deemter
Zheng
Zheng
Publication venue: 'Elsevier BV'
Publication date: 01/10/2012
Field of study

Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download

City Research Online

Elsevier - Publisher Connector

Crossref