599 research outputs found
Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art
Journal ArticleWe aim to shed light on the state-of-the-art in NP coreference resolution by teasing apart the differences in the MUC and ACE task definitions, the assumptions made in evaluation methodologies, and inherent differences in text corpora. First, we examine three subproblems that play a role in coreference resolution: named entity recognition, anaphoricity determination, and coreference element detection. We measure the impact of each subproblem on coreference resolution and confirm that certain assumptions regarding these subproblems in the evaluation methodology can dramatically simplify the overall task. Second, we measure the performance of a state-of-the-art coreference resolver on several classes of anaphora and use these results to develop a quantitative measure for estimating coreference resolution performance on new data sets
Comparing knowledge sources for nominal anaphora resolution
We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora
and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links
encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora
by means of shallow lexico-semantic patterns. As corpora we use the British National
Corpus (BNC), as well as the Web, which has not been previously used for this task. Our
results show that (a) the knowledge encoded in WordNet is often insufficient, especially for
anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for
other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite
NP coreference, the Web-based method yields results comparable to those obtained using
WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the
dataset; (d) in both case studies, the BNC-based method is worse than the other methods because
of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge
gap often encountered in anaphora resolution, and handled examples with context-dependent relations
between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling
of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems
Anaphora resolution for bengali: An experiment with domain adaptation
In this paper we present our first attempt on anaphora resolution for a resource poor language, namely Bengali. We address the issue of adapting a state-of-the-art system, BART, which was originally developed for English. Overall performance of co-reference resolution greatly depends on the high accurate mention detectors. We develop a number of models based on the heuristics used as well as on the particular machine learning employed. Thereafter we perform a series of experiments for adapting BART for Bengali. Our evaluation shows, a language-dependant system (designed primarily for English) can achieve a good performance level when re-trained and tested on a new language with proper subsets of features. The system produces the recall, precision and F-measure values of 56.00%, 46.50% and 50.80%, respectively. The contribution of this work is two-fold, viz. (i). attempt to build a machine learning based anaphora resolution system for a resource-poor Indian language; and (ii). domain adaptation of a state-of-the-art English co-reference resolution system for Bengali, which has completely different orthography and characteristics
Anaphora Resolution in Business Process Requirement Engineering
Anaphora resolution (AR) is one of the most important tasks in natural language processing which focuses on the problem of resolving what a pronoun, or a noun phrase refers to. Moreover, AR plays an essential role when dealing with business process textual description, either when trying to discover the process model from the text, or when validating an existing model. It helps these systems in discovering the core components in any process model (actors and objects).In this paper, we propose a domain specific AR system. The approach starts by automatically generating the concept map of the text, then the system uses this map to resolve references using the syntactic and semantic relations in the concept map. The approach outperforms the state-of-the art performance in the domain of business process texts with more than 73% accuracy. In addition, this approach could be easily adopted to resolve references in other domains
Coreference resolution on entities and events for hospital discharge summaries
Includes bibliographical references (p. 76-80).Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.The wealth of medical information contained in electronic medical records (EMRs) and Natural Language Processing (NLP) technologies that can automatically extract information from them have opened the doors to automatic patient-care quality monitoring and medical- assist question answering systems. This thesis studies coreference resolution, an information extraction (IE) subtask that links together specific mentions to each entity. Coreference resolution enables us to find changes in the state of entities and makes it possible to answer questions regarding the information thus obtained. We perform coreference resolution on a specific type of EMR, the hospital discharge summary. We treat coreference resolution as a binary classification problem. Our approach yields insights into the critical features for coreference resolution for entities that fall into five medical semantic categories that commonly appear in discharge summaries.by Tian Ye He.M.Eng
- …