Search CORE

32 research outputs found

Using \u27Low-cost\u27 Learning Features for Pronoun Resolution

Author: Cuevas Ramon Re Moya
Paraboni Ivandre
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

Analysis of Anaphora Resolution System for English Language

Author: Dr Pratistha Mathur
Dr Priya Lakhmani
Smita Singh
Sudha Morwal
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers. It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for English Language in which pronouns refers to the intended noun in discourse. In this paper two computational models are proposed for anaphora resolution. Resolution of anaphora is based on various factors among which these models use Recency factor and Animistic Knowledge. Recency factor is implemented by using Lappin Leass approach in first model and using Centering approach in second model. Information about animacy is obtained by Gazetteer method. The identification of animistic elements is employed to improve the accuracy of the system. This paper demonstrates experiment conducted by both the models on different data sets from different domains. A comparative result of both the model is summarized and conclusion is drawn for the best suitable model

CiteSeerX

A Review of the Repeated Name Penalty: Implications for Null Subject Languages

Author: Gelormini Lezama Carlos
Publication venue: Universidade Federal do Rio de Janeiro
Publication date: 01/12/2012
Field of study

This is a critical review of the anaphoric processing delay known as the Repeated Name Penalty (RNP: Gordon, Grosz, & Gilliom, 1993). In this paper I argue that the RNP should be understood as an interaction effect between the anaphor type and the discourse prominence of the referent, and not merely as a pairwise comparison between sentences with repeated names and corresponding sentences with pronouns. I further propose that in null subject languages, the relevant anaphor that should be contrasted with the repeated name is the null pronoun because this type of pronoun represents the least informative anaphor available.Esta é uma revisão crítica do atraso de processamento conhecido como Penalidade do Nome Repetido (PNR: Gordon, Grosz e Gilliom, 1993). Neste artigo, defendo que a PNR deve ser entendida como um efeito da interação entre o tipo de anáfora e a saliência do referente discursivo, e não apenas como uma comparação pareada entre sentenças com nomes repetidos e sentenças correspondentes com pronomes. Proponho também que, em línguas com sujeito nulo, a anáfora relevante que deve ser contrastada com o nome repetido é o pronome nulo, porque esse tipo de pronome representa a anáfora menos informativa disponível.Fil: Gelormini Lezama, Carlos. Instituto de Neurología Cognitiva. Laboratorio de Psicología Experimental y Neurociencia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentin

CONICET Digital

Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation

Author: Ferrandez A.
Peral J.
Publication venue: 'AI Access Foundation'
Publication date: 23/06/2011
Field of study

This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese

arXiv.org e-Print Archive

Crossref

Linguistics parameters for zero anaphora resolution

Author: Pereira Simone Cristina
Publication venue
Publication date: 01/01/2010
Field of study

Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

Sapientia

A review of the repeated name penalty: implications for null subject languages

Author: Lezama Carlos Gelormini
Publication venue: 'Revista Linguistica'
Publication date: 01/05/2015
Field of study

This is a critical review of the anaphoric processing delay known as the Repeated Name Penalty (RNP: Gordon, Grosz, Gilliom, 1993). In this paper I argue that the RNP should be understood as an interaction effect between the anaphor type and the discourse prominence of the referent, and not merely as a pairwise comparison between sentences with repeated names and corresponding sentences with pronouns. I further propose that in null subject languages, the relevant anaphor that should be contrasted with the repeated name is the null pronoun because this type of pronoun represents the least informative anaphor available

Directory of Open Access Journals

Portal de Periódicos da UFRJ

Coreference resolution for portuguese using parallel corpora word alignment

Author: Souza José Guilherme Camargo de
Publication venue
Publication date: 01/01/2011
Field of study

A área de Extração da Informação tem como objetivo essencial investigar métodos e técnicas para transformar a informação não estruturada presente em textos de língua natural em dados estruturados. Um importante passo deste processo é a resolução de correferência, tarefa que identifica diferentes sintagmas nominais que se referem a mesma entidade no discurso. A área de estudos sobre resolução de correferência tem sido extensivamente pesquisada para a Língua Inglesa (Ng, 2010) lista uma série de estudos da área, entretanto tem recebido menos atenção em outras línguas. Isso se deve ao fato de que a grande maioria das abordagens utilizadas nessas pesquisas são baseadas em aprendizado de máquina e, portanto, requerem uma extensa quantidade de dados anotados

Sapientia

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Author: Alos i Font Héctor
Bayatlı Sevilay
Khanna Tanmai
Pirinen Flammie
Swanson Daniel
Tang Irene
Tyers Francis Morton
Washington Jonathan North
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Resolução de anáforas pronominais em documentos em língua portuguesa

Author: Aires Ana Margarida Pereira dos santos
Publication venue: 'Universidade de Evora'
Publication date: 01/10/2006
Field of study

O processo de resolução de anáforas é fundamental para compreender um texto, enquanto o ser o humano o faz com facilidade, simulá-lo computacionalmente não é tarefa fácil. O grande objetivo deste trabalho, está em construir um sistema que dê ao computador a capacidade de inferir para anáforas pronominais, quais os seus antecedentes. O sistema desenvolvido é baseado na metodologia do centering, não só pelos seus princípios, mas também pela possível adequação à língua portuguesa. A avaliação dos resultados obtidos, refletiu algumas limitações, comuns a este tipo de sistemas, pelo que foi proposta e implementada, uma alteração ao algoritmo inicial, com acréscimo de três extensões que permitem preferir uma solução às restantes, em caso de empate. Pela nova avaliação, conclui-se uma melhoria de eficiência na segunda versão do algoritmo que tem em média uma taxa de sucesso crítica de 54% que se entende bastante positiva, uma vez que não se dispunham de corpora isentos de erros de pré-processamento. ***/Abstract - Pronominal Anaphora Resolution in Portuguese Language Documents The process of anaphora resolution is fundamental for the understanding of a text and although a human can do it easily, simulate it on the computer isn't a trivial task. The main goal of this work is to develop a system capable of mining the computer with the capacity to associate pronoun anaphor with the expression they refer to. The developed system is based on the methodology known as centering, not only due to its core ideas, but also because of its adaptability to the Portuguese language. The evaluation of the results obtained showed some limitations, common to these types of systems which lead to a proposal and implementation of improvements, over the first approach, with three extensions that overcome draw situations. The new evaluation shows an improvement over the second version of the algorithm, and has a critical success rate of 54% on average, which is believed to be quite positive considering that no corpora free of pre-processing errors, was available

Repositório Científico da Universidade de Évora

Extracting and Visualizing Quotations from News Wires

Author: Denis Pascal
Mignot Victor
Recourcé Gaëlle
Sagot Benoît
Stern Rosa
Villemonte de La Clergerie Éric
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

International audienceWe introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the architecture of SAPIENS and how it was applied to process a corpus of French news wires from the AFP news agency

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot