32 research outputs found

    Using \u27Low-cost\u27 Learning Features for Pronoun Resolution

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Analysis of Anaphora Resolution System for English Language

    Get PDF
    ABSTRACT Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers. It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for English Language in which pronouns refers to the intended noun in discourse. In this paper two computational models are proposed for anaphora resolution. Resolution of anaphora is based on various factors among which these models use Recency factor and Animistic Knowledge. Recency factor is implemented by using Lappin Leass approach in first model and using Centering approach in second model. Information about animacy is obtained by Gazetteer method. The identification of animistic elements is employed to improve the accuracy of the system. This paper demonstrates experiment conducted by both the models on different data sets from different domains. A comparative result of both the model is summarized and conclusion is drawn for the best suitable model

    A Review of the Repeated Name Penalty: Implications for Null Subject Languages

    Get PDF
    This is a critical review of the anaphoric processing delay known as the Repeated Name Penalty (RNP: Gordon, Grosz, & Gilliom, 1993). In this paper I argue that the RNP should be understood as an interaction effect between the anaphor type and the discourse prominence of the referent, and not merely as a pairwise comparison between sentences with repeated names and corresponding sentences with pronouns. I further propose that in null subject languages, the relevant anaphor that should be contrasted with the repeated name is the null pronoun because this type of pronoun represents the least informative anaphor available.Esta é uma revisão crítica do atraso de processamento conhecido como Penalidade do Nome Repetido (PNR: Gordon, Grosz e Gilliom, 1993). Neste artigo, defendo que a PNR deve ser entendida como um efeito da interação entre o tipo de anáfora e a saliência do referente discursivo, e não apenas como uma comparação pareada entre sentenças com nomes repetidos e sentenças correspondentes com pronomes. Proponho também que, em línguas com sujeito nulo, a anáfora relevante que deve ser contrastada com o nome repetido é o pronome nulo, porque esse tipo de pronome representa a anáfora menos informativa disponível.Fil: Gelormini Lezama, Carlos. Instituto de Neurología Cognitiva. Laboratorio de Psicología Experimental y Neurociencia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentin

    Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation

    Full text link
    This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese

    Linguistics parameters for zero anaphora resolution

    Get PDF
    Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

    A review of the repeated name penalty: implications for null subject languages

    Get PDF
    This is a critical review of the anaphoric processing delay known as the Repeated Name Penalty (RNP: Gordon, Grosz, Gilliom, 1993). In this paper I argue that the RNP should be understood as an interaction effect between the anaphor type and the discourse prominence of the referent, and not merely as a pairwise comparison between sentences with repeated names and corresponding sentences with pronouns. I further propose that in null subject languages, the relevant anaphor that should be contrasted with the repeated name is the null pronoun because this type of pronoun represents the least informative anaphor available

    Coreference resolution for portuguese using parallel corpora word alignment

    Get PDF
    A área de Extração da Informação tem como objetivo essencial investigar métodos e técnicas para transformar a informação não estruturada presente em textos de língua natural em dados estruturados. Um importante passo deste processo é a resolução de correferência, tarefa que identifica diferentes sintagmas nominais que se referem a mesma entidade no discurso. A área de estudos sobre resolução de correferência tem sido extensivamente pesquisada para a Língua Inglesa (Ng, 2010) lista uma série de estudos da área, entretanto tem recebido menos atenção em outras línguas. Isso se deve ao fato de que a grande maioria das abordagens utilizadas nessas pesquisas são baseadas em aprendizado de máquina e, portanto, requerem uma extensa quantidade de dados anotados

    Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

    Get PDF
    This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium

    Resolução de anáforas pronominais em documentos em língua portuguesa

    Get PDF
    O processo de resolução de anáforas é fundamental para compreender um texto, enquanto o ser o humano o faz com facilidade, simulá-lo computacionalmente não é tarefa fácil. O grande objetivo deste trabalho, está em construir um sistema que dê ao computador a capacidade de inferir para anáforas pronominais, quais os seus antecedentes. O sistema desenvolvido é baseado na metodologia do centering, não só pelos seus princípios, mas também pela possível adequação à língua portuguesa. A avaliação dos resultados obtidos, refletiu algumas limitações, comuns a este tipo de sistemas, pelo que foi proposta e implementada, uma alteração ao algoritmo inicial, com acréscimo de três extensões que permitem preferir uma solução às restantes, em caso de empate. Pela nova avaliação, conclui-se uma melhoria de eficiência na segunda versão do algoritmo que tem em média uma taxa de sucesso crítica de 54% que se entende bastante positiva, uma vez que não se dispunham de corpora isentos de erros de pré-processamento. ***/Abstract - Pronominal Anaphora Resolution in Portuguese Language Documents The process of anaphora resolution is fundamental for the understanding of a text and although a human can do it easily, simulate it on the computer isn't a trivial task. The main goal of this work is to develop a system capable of mining the computer with the capacity to associate pronoun anaphor with the expression they refer to. The developed system is based on the methodology known as centering, not only due to its core ideas, but also because of its adaptability to the Portuguese language. The evaluation of the results obtained showed some limitations, common to these types of systems which lead to a proposal and implementation of improvements, over the first approach, with three extensions that overcome draw situations. The new evaluation shows an improvement over the second version of the algorithm, and has a critical success rate of 54% on average, which is believed to be quite positive considering that no corpora free of pre-processing errors, was available

    Extracting and Visualizing Quotations from News Wires

    Get PDF
    International audienceWe introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the architecture of SAPIENS and how it was applied to process a corpus of French news wires from the AFP news agency
    corecore