682 research outputs found

    AnaPro, Tool for Identification and Resolution of Direct Anaphora in Spanish

    Get PDF
    Introduction Anaphora is a relation of coreference between linguistic terms. According to Webster’s dictionary: “It is the use of a grammatical substitute (as a pronoun or a pro-verb) to refer to the denotation of a preceding word or group of words; also : the relation between a grammatical substitute and its antecedent.” Therefore, anaphora is a discourse relation. Anaphora resolution is very important in Natural Language Processing (NLP). This work is part of Project OM* (Ontology Merging), which seeks to build a large ontology by fusing smaller ontologies extracted from textual documents. An important part of the project is to analyze the sentences in a document with the goal to transform that text into an ontology that comprises its contents. A brief description of Project OM* follows.AnaPro is software that solves direct anaphora in Spanish, specifically pronouns: it finds the noun or group of words to which the pronoun refers. It locates in the previous sentenc es the referent or antecedent which the pronoun replaces. An example of a direct anaphora solved is the pronoun “ he” in the sentence “He is sad.” Much of the work on anaphora has been done for texts in English; thus , we specifically focus on Spanish documents. AnaPro directly supports text analys is (to understand what a document says ), a non trivial task since there are different writing styles, references, idiomatic expressions, etc. The problem grows if t he analyzer is a computer, because they lack “common sense” (which persons possess) . Hence, before text analysis, its preprocessing is required, in order to assign tags (noun, verb,...) to each word, find the stems, disambiguate nouns, verbs, prepositions, identify colloquial expressions, i dentify and resolve anaphor a, among other chores. AnaPro works for Spanish sentences. It is a novel procedure, since it is automatic (no user intervenes during the resolution) and it does not need dictionaries. It employs heu ristics procedures to discover the semantics and help in the decisions; they are rather easy to implement and use li mited knowledge. Nevertheless, its results are good (81% of correct answers, at least). However, more tests will give a better idea of its goodness.Authors I.T. and E.V. would like to acknowledge ESCOM-IPN, where they defended their thesis, #20110083 , which gives a more detailed description of AnaPro. Work herein reported was partially sponsored by CONACYT Grant #128163 (Project OM*), by IPN and by SNI and UAEM

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Information structure and the referential status of linguistic expression : workshop as part of the 23th annual meetings of the Deutsche Gesellschaft für Sprachwissenschaft in Leipzig, Leipzig, February 28 - March 2, 2001

    Get PDF
    This volume comprises papers that were given at the workshop Information Structure and the Referential Status of Linguistic Expressions, which we organized during the Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Conference in Leipzig in February 2001. At this workshop we discussed the connection between information structure and the referential interpretation of linguistic expressions, a topic mostly neglected in current linguistics research. One common aim of the papers is to find out to what extent the focus-background as well as the topic-comment structuring determine the referential interpretation of simple arguments like definite and indefinite NPs on the one hand and sentences on the other

    The Effect of Enhancing Learner Input via Computer Assisted Language Learning Tools: On the Acquisition of Clitics by Spanish Second Language Learners

    Get PDF
    The current project contributes to the growing body of research in second language acquisition that investigates the facilitative effects of drawing learner attention to problematic aspects of linguistic input through input enhancement. Specifically, the research examines the extent to which input enhancement (Sharwood Smith 1991, 1993) via typographically altered texts facilitates the acquisition of third person dative and accusative clitic pronouns in Spanish for university level native English speakers enrolled in both beginner and advanced levels of Spanish second language courses. A number of past studies have indicated that all verbal clitics have been an obstacle in gaining L2 Spanish proficiency. Prior research has indicated that the most difficult pronominal system for L1 English speakers learning L2 Spanish are the third person dative and accusative clitic pronouns (VanPatten 1984). These students also tend to misinterpret preverbal clitics as subjects (VanPatten 1984). The significance of the project is twofold. First, it will consider whether input enhancement used to facilitate the identification of referents of anaphoric pronouns aides in the acquisition of the pronouns. Secondly, it examines at which level, beginner or advanced, input enhancement could be most beneficial for the L2 Spanish student. It also takes into account local comprehension, whereas prior input enhancement studies have investigated global comprehension. Furthermore, the results may have a direct impact on future computer technology that could be developed for foreign language instruction as it investigates the effects of this external attention drawing device in a Computer Assisted Language Learning setting. Results of the investigation indicated that input enhancement at the advanced level was successful in aiding L2 Spanish students to comprehend anaphora resolution with third person dative and accusative clitic pronouns in Spanish
    corecore