143 research outputs found
Using \u27Low-cost\u27 Learning Features for Pronoun Resolution
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation
This paper evaluates the different tasks carried out in the translation of
pronominal anaphora in a machine translation (MT) system. The MT interlingua
approach named AGIR (Anaphora Generation with an Interlingua Representation)
improves upon other proposals presented to date because it is able to translate
intersentential anaphors, detect co-reference chains, and translate Spanish
zero pronouns into English---issues hardly considered by other systems. The
paper presents the resolution and evaluation of these anaphora problems in AGIR
with the use of different kinds of knowledge (lexical, morphological,
syntactic, and semantic). The translation of English and Spanish anaphoric
third-person personal pronouns (including Spanish zero pronouns) into the
target language has been evaluated on unrestricted corpora. We have obtained a
precision of 80.4% and 84.8% in the translation of Spanish and English
pronouns, respectively. Although we have only studied the Spanish and English
languages, our approach can be easily extended to other languages such as
Portuguese, Italian, or Japanese
Cross-Lingual Zero Pronoun Resolution
In languages like Arabic, Chinese, Italian, Japanese, Korean, Portuguese, Spanish, and many others, predicate arguments in certainsyntactic positions are not realized instead of being realized as overt pronouns, and are thus called zero- or null-pronouns. Identifyingand resolving such omitted arguments is crucial to machine translation, information extraction and other NLP tasks, but depends heavilyonsemanticcoherenceandlexicalrelationships. WeproposeaBERT-basedcross-lingualmodelforzeropronounresolution,andevaluateit on the Arabic and Chinese portions of OntoNotes 5.0. As far as we know, ours is the first neural model of zero-pronoun resolutionfor Arabic; and our model also outperforms the state-of-the-art for Chinese. In the paper we also evaluate BERT feature extraction andfine-tune models on the task, and compare them with our model. We also report on an investigation of BERT layers indicating whichlayer encodes the most suitable representation for the task. Our code is available at https://github.com/amaloraini/cross-lingual-Z
Analysis of Anaphora Resolution System for English Language
ABSTRACT Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers. It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for English Language in which pronouns refers to the intended noun in discourse. In this paper two computational models are proposed for anaphora resolution. Resolution of anaphora is based on various factors among which these models use Recency factor and Animistic Knowledge. Recency factor is implemented by using Lappin Leass approach in first model and using Centering approach in second model. Information about animacy is obtained by Gazetteer method. The identification of animistic elements is employed to improve the accuracy of the system. This paper demonstrates experiment conducted by both the models on different data sets from different domains. A comparative result of both the model is summarized and conclusion is drawn for the best suitable model
Linguistics parameters for zero anaphora resolution
Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero
anaphora resolution in the context of a natural language processing chain developed for
Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing)
in several syntactic contexts in order to avoid the redundancy that would result from
repetition of previously mentioned words. The co-reference relation between the zeroed
element and its antecedent (or previous mention) in the discourse is here called zero
anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be
viewed as a subtask of anaphora resolution and has an essential role in various Natural
Language Processing applications such as information extraction, automatic abstracting,
dialog systems, machine translation and question answering. The main goal of this
dissertation is to describe the grammatical rules imposing subject NP deletion and referential
constraints in the Brazilian Portuguese, in order to allow a correct identification of the
antecedent of the deleted subject NP. Some of these rules were then formalized into the
Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a
module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language
Laboratory (L2F). Using this rule-based approach we expected to improve the performance
of the Portuguese grammar namely by producing better dependency structures with
(reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity
of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within
sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized
based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic
(and no semantic) knowledge. A corpus of different text genres was manually annotated for
zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based
approached is evaluated and results are presented and discussed
Coreference resolution for portuguese using parallel corpora word alignment
A área de Extração da Informação tem como objetivo essencial investigar
métodos e técnicas para transformar a informação não estruturada presente em
textos de língua natural em dados estruturados. Um importante passo deste
processo é a resolução de correferência, tarefa que identifica diferentes sintagmas
nominais que se referem a mesma entidade no discurso. A área de estudos sobre
resolução de correferência tem sido extensivamente pesquisada para a Língua
Inglesa (Ng, 2010) lista uma série de estudos da área, entretanto tem recebido
menos atenção em outras línguas. Isso se deve ao fato de que a grande maioria das
abordagens utilizadas nessas pesquisas são baseadas em aprendizado de máquina
e, portanto, requerem uma extensa quantidade de dados anotados
Coreference resolution: maximum metric score training, domain adaptation, and zero pronoun resolution
Ph.DDOCTOR OF PHILOSOPH
ELECTRA for Neural Coreference Resolution in Italian
In recent years, the impact of Neural Language Models has changed every field of Natural Language Processing. In this scenario, coreference resolution has been among the least considered task, especially in language other than English. This work proposes a coreference resolution system for Italian, based on a neural end-to-end architecture integrating ELECTRA language model and trained on OntoCorefIT, a novel Italian dataset built starting from OntoNotes. Even if some approaches for Italian have been proposed in the last decade, to the best of our knowledge, this is the first neural coreference resolver aimed specifically to Italian. The performance of the system is evaluated with respect to three different metrics and also assessed by replacing ELECTRA with the widely-used BERT language model, since its usage has proven to be effective in the coreference resolution task in English. A qualitative analysis has also been conducted, showing how different grammatical categories affect performance in an inflectional and morphological-rich language like Italian. The overall results have shown the effectiveness of the proposed solution, providing a baseline for future developments of this line of research in Italian
Coreference Resolution for Arabic
Recently, there has been enormous progress in coreference resolution. These recent developments were applied to Chinese, English and other languages, with outstanding results. However, languages with a rich morphology or fewer resources, such as Arabic, have not received as much attention. In fact, when this PhD work started there was no neural coreference resolver for Arabic, and we were not aware of any learning-based coreference resolver for Arabic since [Björkelund and Kuhn, 2014]. In addition, as far as we know, whereas lots of attention had been devoted to the phemomenon of zero anaphora in languages such as Chinese or Japanese, no neural model for Arabic zero-pronoun anaphora had been developed. In this thesis, we report on a series of experiments on Arabic coreference resolution in general and on zero anaphora in particular. We propose a new neural coreference resolver for Arabic, and we present a series of models for identifying and resolving Arabic zero pronouns. Our approach for zero-pronoun identification and resolution is applicable to other languages, and was also evaluated on Chinese, with results surpassing the state of the art at the time. This research also involved producing revised versions of standard datasets for Arabic coreference
- …