525 research outputs found

    NP Animacy Identification for Anaphora Resolution

    Get PDF
    In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision

    Anaphora Resolution Using Named Entity and Ontology

    Get PDF
    Proceedings of the Second Workshop on Anaphora Resolution (WAR II). Editor: Christer Johansson. NEALT Proceedings Series, Vol. 2 (2008), 91-96. © 2008 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/7129

    AnaPro, Tool for Identification and Resolution of Direct Anaphora in Spanish

    Get PDF
    Introduction Anaphora is a relation of coreference between linguistic terms. According to Webster’s dictionary: “It is the use of a grammatical substitute (as a pronoun or a pro-verb) to refer to the denotation of a preceding word or group of words; also : the relation between a grammatical substitute and its antecedent.” Therefore, anaphora is a discourse relation. Anaphora resolution is very important in Natural Language Processing (NLP). This work is part of Project OM* (Ontology Merging), which seeks to build a large ontology by fusing smaller ontologies extracted from textual documents. An important part of the project is to analyze the sentences in a document with the goal to transform that text into an ontology that comprises its contents. A brief description of Project OM* follows.AnaPro is software that solves direct anaphora in Spanish, specifically pronouns: it finds the noun or group of words to which the pronoun refers. It locates in the previous sentenc es the referent or antecedent which the pronoun replaces. An example of a direct anaphora solved is the pronoun “ he” in the sentence “He is sad.” Much of the work on anaphora has been done for texts in English; thus , we specifically focus on Spanish documents. AnaPro directly supports text analys is (to understand what a document says ), a non trivial task since there are different writing styles, references, idiomatic expressions, etc. The problem grows if t he analyzer is a computer, because they lack “common sense” (which persons possess) . Hence, before text analysis, its preprocessing is required, in order to assign tags (noun, verb,...) to each word, find the stems, disambiguate nouns, verbs, prepositions, identify colloquial expressions, i dentify and resolve anaphor a, among other chores. AnaPro works for Spanish sentences. It is a novel procedure, since it is automatic (no user intervenes during the resolution) and it does not need dictionaries. It employs heu ristics procedures to discover the semantics and help in the decisions; they are rather easy to implement and use li mited knowledge. Nevertheless, its results are good (81% of correct answers, at least). However, more tests will give a better idea of its goodness.Authors I.T. and E.V. would like to acknowledge ESCOM-IPN, where they defended their thesis, #20110083 , which gives a more detailed description of AnaPro. Work herein reported was partially sponsored by CONACYT Grant #128163 (Project OM*), by IPN and by SNI and UAEM

    Anaphoric resolution of zero pronouns in Chinese in translation and reading comprehension

    Get PDF
    The primary aim of the thesis is to investigate some of the processes of reading Chinese text by means of comparing and analysing approximately 100 parallel translations of four texts from Chinese to English. The translations are answers to A Level examination questions. The focus of the investigation is interpretation of the zero pronoun, a common phenomenon in Chinese, which often requires explicitation when translated into English. The secondary aim is to show how translation gives evidence of comprehension, as shown by the variation in interpretation of zero pronouns. The thesis reviews relevant psycholinguistic research into reading, particularly reading of Chinese text. This is followed by reviews of relevant research into translation as a reading activity, and a discussion of its role in language teaching and testing.The core of the thesis is the discussion of the zero pronoun in Chinese, including discussion of anaphoric choice - the writer's decision on when to use zero in preference to an explicit anaphoric form - and of anaphoric resolution - how a reader decides what a zero pronoun refers to. Anaphoric resolution may be problematic for less experienced readers of Chinese owing to its lack of rich morphological inflection which, in other languages, provides the reader with information. Some of the key ideas on anaphoric choice and resolution are then applied to the analysis of the data in the parallel translations. It would appear that factors in Chinese texts which have an effect on comprehending zero pronouns are antecedent distance, topic persistence, abstraction, multiplicity of arguments and the meaning of the verb. Characteristics of the reader which may affect comprehension of the zero pronoun include personal schemata which may lead to elaborative inferences. On the basis of the data I suggest that mark schemes could be devised on a scalar system encompassing optimal solution, proximal solution and nonsolution, which might help to solve the problem of variability in marking translation.A by-product of the thesis, and an avenue for further research, is the apparent close relationship between idea units, clause length, punctuation breaks and antecedent distance in Chinese texts and saccade length and working memory capacity in the reader of Chinese

    A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature

    Get PDF
    We introduce a modular, hybrid coreference resolution system that extends a rule-based baseline with three neural classifiers for the subtasks mention detection, mention attributes (gender, animacy, number), and pronoun resolution. The classifiers substantially increase coreference performance in our experiments with Dutch literature across all metrics on the development set: mention detection, LEA, CoNLL, and especially pronoun accuracy. However, on the test set, the best results are obtained with rule-based pronoun resolution. This inconsistent result highlights that the rule-based system is still a strong baseline, and more work is needed to improve pronoun resolution robustly for this dataset. While end-to-end neural systems require no feature engineering and achieve excellent performance in standard benchmarks with large training sets, our simple hybrid system scales well to long document coreference (>10k words) and attains superior results in our experiments on literature
    corecore