32 research outputs found
Using \u27Low-cost\u27 Learning Features for Pronoun Resolution
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Analysis of Anaphora Resolution System for English Language
ABSTRACT Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers. It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for English Language in which pronouns refers to the intended noun in discourse. In this paper two computational models are proposed for anaphora resolution. Resolution of anaphora is based on various factors among which these models use Recency factor and Animistic Knowledge. Recency factor is implemented by using Lappin Leass approach in first model and using Centering approach in second model. Information about animacy is obtained by Gazetteer method. The identification of animistic elements is employed to improve the accuracy of the system. This paper demonstrates experiment conducted by both the models on different data sets from different domains. A comparative result of both the model is summarized and conclusion is drawn for the best suitable model
A Review of the Repeated Name Penalty: Implications for Null Subject Languages
This is a critical review of the anaphoric processing delay known as the Repeated Name Penalty (RNP: Gordon, Grosz, & Gilliom, 1993). In this paper I argue that the RNP should be understood as an interaction effect between the anaphor type and the discourse prominence of the referent, and not merely as a pairwise comparison between sentences with repeated names and corresponding sentences with pronouns. I further propose that in null subject languages, the relevant anaphor that should be contrasted with the repeated name is the null pronoun because this type of pronoun represents the least informative anaphor available.Esta é uma revisão crítica do atraso de processamento conhecido como Penalidade do Nome Repetido (PNR: Gordon, Grosz e Gilliom, 1993). Neste artigo, defendo que a PNR deve ser entendida como um efeito da interação entre o tipo de anáfora e a saliência do referente discursivo, e não apenas como uma comparação pareada entre sentenças com nomes repetidos e sentenças correspondentes com pronomes. Proponho também que, em línguas com sujeito nulo, a anáfora relevante que deve ser contrastada com o nome repetido é o pronome nulo, porque esse tipo de pronome representa a anáfora menos informativa disponível.Fil: Gelormini Lezama, Carlos. Instituto de Neurología Cognitiva. Laboratorio de Psicología Experimental y Neurociencia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentin
Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation
This paper evaluates the different tasks carried out in the translation of
pronominal anaphora in a machine translation (MT) system. The MT interlingua
approach named AGIR (Anaphora Generation with an Interlingua Representation)
improves upon other proposals presented to date because it is able to translate
intersentential anaphors, detect co-reference chains, and translate Spanish
zero pronouns into English---issues hardly considered by other systems. The
paper presents the resolution and evaluation of these anaphora problems in AGIR
with the use of different kinds of knowledge (lexical, morphological,
syntactic, and semantic). The translation of English and Spanish anaphoric
third-person personal pronouns (including Spanish zero pronouns) into the
target language has been evaluated on unrestricted corpora. We have obtained a
precision of 80.4% and 84.8% in the translation of Spanish and English
pronouns, respectively. Although we have only studied the Spanish and English
languages, our approach can be easily extended to other languages such as
Portuguese, Italian, or Japanese
Linguistics parameters for zero anaphora resolution
Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero
anaphora resolution in the context of a natural language processing chain developed for
Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing)
in several syntactic contexts in order to avoid the redundancy that would result from
repetition of previously mentioned words. The co-reference relation between the zeroed
element and its antecedent (or previous mention) in the discourse is here called zero
anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be
viewed as a subtask of anaphora resolution and has an essential role in various Natural
Language Processing applications such as information extraction, automatic abstracting,
dialog systems, machine translation and question answering. The main goal of this
dissertation is to describe the grammatical rules imposing subject NP deletion and referential
constraints in the Brazilian Portuguese, in order to allow a correct identification of the
antecedent of the deleted subject NP. Some of these rules were then formalized into the
Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a
module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language
Laboratory (L2F). Using this rule-based approach we expected to improve the performance
of the Portuguese grammar namely by producing better dependency structures with
(reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity
of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within
sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized
based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic
(and no semantic) knowledge. A corpus of different text genres was manually annotated for
zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based
approached is evaluated and results are presented and discussed
A review of the repeated name penalty: implications for null subject languages
This is a critical review of the anaphoric processing delay known as the Repeated Name Penalty (RNP: Gordon, Grosz, Gilliom, 1993). In this paper I argue that the RNP should be understood as an interaction effect between the anaphor type and the discourse prominence of the referent, and not merely as a pairwise comparison between sentences with repeated names and corresponding sentences with pronouns. I further propose that in null subject languages, the relevant anaphor that should be contrasted with the repeated name is the null pronoun because this type of pronoun represents the least informative anaphor available
Coreference resolution for portuguese using parallel corpora word alignment
A área de Extração da Informação tem como objetivo essencial investigar
métodos e técnicas para transformar a informação não estruturada presente em
textos de língua natural em dados estruturados. Um importante passo deste
processo é a resolução de correferência, tarefa que identifica diferentes sintagmas
nominais que se referem a mesma entidade no discurso. A área de estudos sobre
resolução de correferência tem sido extensivamente pesquisada para a Língua
Inglesa (Ng, 2010) lista uma série de estudos da área, entretanto tem recebido
menos atenção em outras línguas. Isso se deve ao fato de que a grande maioria das
abordagens utilizadas nessas pesquisas são baseadas em aprendizado de máquina
e, portanto, requerem uma extensa quantidade de dados anotados
Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages
This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium
Resolução de anáforas pronominais em documentos em língua portuguesa
O processo de resolução de anáforas é fundamental para compreender um texto, enquanto o ser o humano o faz com facilidade, simulá-lo computacionalmente não é tarefa fácil. O grande objetivo deste trabalho, está em construir um sistema que dê ao computador a capacidade de inferir para anáforas pronominais, quais os seus antecedentes. O sistema desenvolvido é baseado na metodologia do centering, não só pelos seus princípios, mas também pela possível adequação à língua portuguesa. A avaliação dos resultados obtidos, refletiu algumas limitações, comuns a este tipo de sistemas, pelo que foi proposta e implementada, uma alteração ao algoritmo inicial, com acréscimo de três extensões que permitem preferir uma solução às restantes, em caso de empate. Pela nova avaliação, conclui-se uma melhoria de eficiência na segunda versão do algoritmo que tem em média uma taxa de sucesso crítica de 54% que se entende bastante positiva, uma vez que não se dispunham de corpora isentos de erros de pré-processamento. ***/Abstract - Pronominal Anaphora Resolution in Portuguese Language Documents The process of anaphora resolution is fundamental for the understanding of a text and although a human can do it easily, simulate it on the computer isn't a trivial task. The main goal of this work is to develop a system capable of mining the computer with the capacity to associate pronoun anaphor with the expression they refer to. The developed system is based on the methodology known as centering, not only due to its core ideas, but also because of its adaptability to the Portuguese language. The evaluation of the results obtained showed some limitations, common to these types of systems which lead to a proposal and implementation of improvements, over the first approach, with three extensions that overcome draw situations. The new evaluation shows an improvement over the second version of the algorithm, and has a critical success rate of 54% on average, which is believed to be quite positive considering that no corpora free of pre-processing errors, was available
Extracting and Visualizing Quotations from News Wires
International audienceWe introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the architecture of SAPIENS and how it was applied to process a corpus of French news wires from the AFP news agency