1,679 research outputs found
Comparing knowledge sources for nominal anaphora resolution
We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora
and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links
encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora
by means of shallow lexico-semantic patterns. As corpora we use the British National
Corpus (BNC), as well as the Web, which has not been previously used for this task. Our
results show that (a) the knowledge encoded in WordNet is often insufficient, especially for
anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for
other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite
NP coreference, the Web-based method yields results comparable to those obtained using
WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the
dataset; (d) in both case studies, the BNC-based method is worse than the other methods because
of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge
gap often encountered in anaphora resolution, and handled examples with context-dependent relations
between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling
of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems
Using the web to resolve coreferent bridging in German newspaper text
We adopt Markert and Nissim (2005)’s approach of using the World Wide Web to resolve cases of coreferent bridging for German and discuss the strength and weaknesses of this approach. As the general approach of using surface patterns to get information on ontological relations between lexical items has only been tried on English, it is also interesting to see whether the approach works for German as well as it does for English and what differences between these languages need to be accounted for. We also present a novel approach for combining several patterns that yields an ensemble that outperforms the best-performing single patterns in terms of both precision and recall
Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation
We argue for a performance-based design of natural language grammars and
their associated parsers in order to meet the constraints imposed by real-world
NLP. Our approach incorporates declarative and procedural knowledge about
language and language use within an object-oriented specification framework. We
discuss several message-passing protocols for parsing and provide reasons for
sacrificing completeness of the parse in favor of efficiency based on a
preliminary empirical evaluation.Comment: 12 pages, uses epsfig.st
Improving an Anaphora Resolution System for Norwegian
Proceedings of the Second Workshop on Anaphora Resolution
(WAR II).
Editor: Christer Johansson.
NEALT Proceedings Series, Vol. 2 (2008), 27-30.
© 2008 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/7129
AnaPro, Tool for Identification and Resolution of Direct Anaphora in Spanish
Introduction Anaphora is a relation of coreference between linguistic terms. According to Webster’s dictionary: “It is the use of a grammatical substitute (as a pronoun or a pro-verb) to refer to the denotation of a preceding word or group of words; also : the relation between a grammatical substitute and its antecedent.” Therefore, anaphora is a discourse relation. Anaphora resolution is very important in Natural Language Processing (NLP). This work is part of Project OM* (Ontology Merging), which seeks to build a large ontology by fusing smaller ontologies extracted from textual documents. An important part of the project is to analyze the sentences in a document with the goal to transform that text into an ontology that comprises its contents. A brief description of Project OM* follows.AnaPro is software that solves direct anaphora in Spanish, specifically pronouns: it finds the noun or group of words to which the pronoun refers. It locates in the previous sentenc es the referent or antecedent which the pronoun replaces. An example of a direct anaphora solved is the pronoun “ he” in the sentence “He is sad.” Much of the work on anaphora has been done for texts in English; thus , we specifically focus on Spanish documents. AnaPro directly supports text analys is (to understand what a document says ), a non trivial task since there are different writing styles, references, idiomatic expressions, etc. The problem grows if t he analyzer is a computer, because they lack “common sense” (which persons possess) . Hence, before text analysis, its preprocessing is required, in order to assign tags (noun, verb,...) to each word, find the stems, disambiguate nouns, verbs, prepositions, identify colloquial expressions, i dentify and resolve anaphor a, among other chores. AnaPro works for Spanish sentences. It is a novel procedure, since it is automatic (no user intervenes during the resolution) and it does not need dictionaries. It employs heu ristics procedures to discover the semantics and help in the decisions; they are rather easy to implement and use li mited knowledge. Nevertheless, its results are good (81% of correct answers, at least). However, more tests will give a better idea of its goodness.Authors I.T. and E.V. would like to acknowledge ESCOM-IPN, where they defended their thesis, #20110083 , which gives a more detailed description of AnaPro. Work herein reported was partially sponsored by CONACYT Grant #128163 (Project OM*), by IPN and by SNI and UAEM
A Corpus-Based Investigation of Definite Description Use
We present the results of a study of definite descriptions use in written
texts aimed at assessing the feasibility of annotating corpora with information
about definite description interpretation. We ran two experiments, in which
subjects were asked to classify the uses of definite descriptions in a corpus
of 33 newspaper articles, containing a total of 1412 definite descriptions. We
measured the agreement among annotators about the classes assigned to definite
descriptions, as well as the agreement about the antecedent assigned to those
definites that the annotators classified as being related to an antecedent in
the text. The most interesting result of this study from a corpus annotation
perspective was the rather low agreement (K=0.63) that we obtained using
versions of Hawkins' and Prince's classification schemes; better results
(K=0.76) were obtained using the simplified scheme proposed by Fraurud that
includes only two classes, first-mention and subsequent-mention. The agreement
about antecedents was also not complete. These findings raise questions
concerning the strategy of evaluating systems for definite description
interpretation by comparing their results with a standardized annotation. From
a linguistic point of view, the most interesting observations were the great
number of discourse-new definites in our corpus (in one of our experiments,
about 50% of the definites in the collection were classified as discourse-new,
30% as anaphoric, and 18% as associative/bridging) and the presence of
definites which did not seem to require a complete disambiguation.Comment: 47 pages, uses fullname.sty and palatino.st
Leveraging different meronym discovery methods for bridging resolution in French
International audienceThis paper presents a statistical system for resolving bridging descriptions in French, a language for which current lexical resources have a very low overage. The system is similar to that developed for English by Poesio but it was enriched to integrate meronymic information extracted automatically from both web queries and raw text using syntactic patterns. Through various experiments on the DEDE corpus, we show that although still mediocre the performance of our system compare favorably to those obtained by Poesio for English. In addition, our evaluation indicates that the different meronym extraction methods have a cumulative effect, but that the text pattern-based extraction method is more robust and leads to higher accuracy than the web-based approach
Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences
Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective—noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-senseper-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance
- …