Search CORE

3,515 research outputs found

Corpus-based identification of non-anaphoric noun phrases

Author: Bean David L.
Riloff Ellen M.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1999
Field of study

Journal ArticleCoreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphoric, which has the potential to improve the efficiency and accuracy of coreference resolution systems. Our algorithm generates lists of nonanaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. Using 1600 MIX -1 terrorism news articles as the training corpus, our approach achieved 78% recall and 87% precision at identifying such noun phrases in 50 test documents

The University of Utah: J. Willard Marriott Digital Library

A Corpus-Based Investigation of Definite Description Use

Author: Poesio Massimo
Vieira Renata
Publication venue
Publication date: 24/10/1997
Field of study

We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1412 definite descriptions. We measured the agreement among annotators about the classes assigned to definite descriptions, as well as the agreement about the antecedent assigned to those definites that the annotators classified as being related to an antecedent in the text. The most interesting result of this study from a corpus annotation perspective was the rather low agreement (K=0.63) that we obtained using versions of Hawkins' and Prince's classification schemes; better results (K=0.76) were obtained using the simplified scheme proposed by Fraurud that includes only two classes, first-mention and subsequent-mention. The agreement about antecedents was also not complete. These findings raise questions concerning the strategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation. From a linguistic point of view, the most interesting observations were the great number of discourse-new definites in our corpus (in one of our experiments, about 50% of the definites in the collection were classified as discourse-new, 30% as anaphoric, and 18% as associative/bridging) and the presence of definites which did not seem to require a complete disambiguation.Comment: 47 pages, uses fullname.sty and palatino.st

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Coreference resolution in clinical discharge summaries, progress notes, surgical and pathology reports: a unified lexical approach

Author: Gooch P.
Roudsari A.
Publication venue
Publication date: 01/01/2011
Field of study

We developed a lexical rule-based system that uses a unified approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA) provided for the fifth i2b2/VA shared task. Taking the unweighted mean between 4 coreference metrics, validation of the system against the i2b2/VA corpus attained an overall F-score of 87.7% across all mention classes, with a maximum of 93.1% for coreference of persons, and a minimum of 77.2% for coreference of tests. For the ODIE corpus the overall F-score across all mention classes was 79.4%, with a maximum of 82.0% for coreference of persons and a minimum of 13.1% for coreference of diagnostic reagents. For the ODIE corpus our results are comparable to the mean reported inter-annotator agreement with the gold standard. We discuss the four categories of errors we identified, and how these might be addressed. The system uses a number of reusable modules and techniques that may be of benefit to the research community

City Research Online

Comparing knowledge sources for nominal anaphora resolution

Author: Markert K.
Nissim M.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2005
Field of study

We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora by means of shallow lexico-semantic patterns. As corpora we use the British National Corpus (BNC), as well as the Web, which has not been previously used for this task. Our results show that (a) the knowledge encoded in WordNet is often insufficient, especially for anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite NP coreference, the Web-based method yields results comparable to those obtained using WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the dataset; (d) in both case studies, the BNC-based method is worse than the other methods because of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge gap often encountered in anaphora resolution, and handled examples with context-dependent relations between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

White Rose Research Online

Recommended from our members

Lexical patterns, features and knowledge resources for coreference resolution in clinical notes

Author: Abdul Roudsari
D’Avolio
Miller
Phil Gooch
Rahman
Recasens
Rosse
Savova
Savova
Uzuner
van Deemter
Zheng
Zheng
Publication venue: 'Elsevier BV'
Publication date: 01/10/2012
Field of study

Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download

City Research Online

Elsevier - Publisher Connector

Crossref

Non-situational functions of demonstrative noun phrases in Lingala (Bantu)

Author: Meeuwis Michael
Stroeken Koenraad
Publication venue
Publication date: 01/01/2012
Field of study

This paper examines the non-situational (i.e., non-exophoric) pragmatic functions of the three adnominal demonstratives, oyo, wand, and yango in the Bantu language Lingala. An examination of natural language corpora reveals that, although native-speaker intuitions sanction the use of oyo as an anaphor in demonstrative NPs, this demonstrative is hardly ever used in that role. It also reveals that wand, which has both situational and discourse-referential capacities, is used more frequently than the exclusively anaphoric demonstrative yango. It is explained that wand appears in a wide range of non-coreferential expression types, in coreferential expression types involving low-salience referents, and in coreferential expression types that both involve highly salient referents and include the speaker's desire to signal a shift in the mental representation of the referent towards a pejorative reading. The use of yango, on the other hand, is only licensed in cases of coreferentiality involving highly salient referents and implying continuation of the same mental representation of the referent. A specific section is devoted to charting the possible grammaticalization paths followed by the demonstratives. Conclusions are drawn for pragmatic theory formation in terms of the relation between form (yango vs. wand) and function (coreferentiality vs. non-coreferentiality)

Ghent University Academic Bibliography

Reference tracking and non-canonical referring expressions in Indonesian

Author: Adams Nikki B.
Brugman Claudia M
Conners Thomas J
Publication venue
Publication date: 30/03/2016
Field of study

Prometheus-Academic Collections

A Framework for Interpreting Bridging Anaphora

Author: C. Butnariu
D. Bean
D. Ó Séaghdha
I. Hendrickx
J. Levi
J.N. Levi
J.R. Hobbs
K. Fraurud
M. Lauer
M. Poesio
P. Downing
R. Girju
R. Vieira
S. Tratz
S.-N. Kim
S.N. Kim
T. Sanders
Publication venue: Springer
Publication date: 01/01/2013
Field of study

In this paper we present a novel framework for resolving bridging anaphora.We argue that anaphora, particularly bridging anaphora, is used as a shortcut device similar to the use of compound nouns. Hence, the two natural language usage phenomena would have to be based on the same theoretical framework. We use an existing theory on compound nouns to test its validity for anaphora usages. To do this, we used hu- man annotators to interpret indirect anaphora from naturally occurring discourses. The annotators were asked to classify the relations between anaphor-antecedent pairs into relation types that have been previously used to describe the relations between a modi er and the head noun of a compound noun. We obtained very encouraging results with an average Fleiss's value of 0.66 for inter-annotation agreement. The results were evaluated against other similar natural language interpretation annota- tion experiments and were found to compare well. In order to determine the prevalence of the proposed set of anaphora relations we did a detailed analysis of a subset 20 newspaper articles. The results obtained from this also indicated that a majority (98%) of the relations could be described by the relations in the framework. The results from this analysis also showed the distribution of the relation types in the genre of news paper article discourses

Crossref

AUT Scholarly Commons