17 research outputs found
Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation
We describe the design, the evaluation setup, and the results of the DiscoMT 2015 shared task, which included two subtasks, relevant to both the machine translation (MT) and the discourse communities: (i) pronoun-focused translation, a practical MT task, and (ii) cross-lingual pronoun prediction, a classification task that requires no specific MT expertise and is interesting as a machine learning task in its own right. We focused on the English–French language pair, for which MT output is generally of high quality, but has visible issues with pronoun translation due to differences in the pronoun systems of the two languages. Six groups participated in the pronoun-focused translation task and eight groups in the cross-lingual pronoun prediction task
Findings of the 2017 DiscoMT Shared Task on Cross-lingual Pronoun Prediction
We describe the design, the setup, and the
evaluation results of the DiscoMT 2017
shared task on cross-lingual pronoun prediction.
The task asked participants to
predict a target-language pronoun given a
source-language pronoun in the context of
a sentence. We further provided a lemmatized
target-language human-authored
translation of the source sentence, and
automatic word alignments between the
source sentence words and the targetlanguage
lemmata. The aim of the task
was to predict, for each target-language
pronoun placeholder, the word that should
replace it from a small, closed set of
classes, using any type of information that
can be extracted from the entire document.
We offered four subtasks, each for a
different language pair and translation
direction: English-to-French, Englishto-German,
German-to-English, and
Spanish-to-English. Five teams participated
in the shared task, making
submissions for all language pairs. The
evaluation results show that all participating
teams outperformed two strong
n-gram-based language model-based
baseline systems by a sizable margin
Anaphora Resolution in Business Process Requirement Engineering
Anaphora resolution (AR) is one of the most important tasks in natural language processing which focuses on the problem of resolving what a pronoun, or a noun phrase refers to. Moreover, AR plays an essential role when dealing with business process textual description, either when trying to discover the process model from the text, or when validating an existing model. It helps these systems in discovering the core components in any process model (actors and objects).In this paper, we propose a domain specific AR system. The approach starts by automatically generating the concept map of the text, then the system uses this map to resolve references using the syntactic and semantic relations in the concept map. The approach outperforms the state-of-the art performance in the domain of business process texts with more than 73% accuracy. In addition, this approach could be easily adopted to resolve references in other domains
Incorporating pronoun function into statistical machine translation
Pronouns are used frequently in language, and perform a range of functions.
Some pronouns are used to express coreference, and others are not. Languages
and genres differ in how and when they use pronouns and this poses a problem
for Statistical Machine Translation (SMT) systems (Le Nagard and Koehn,
2010; Hardmeier and Federico, 2010; Novák, 2011; Guillou, 2012; Weiner, 2014;
Hardmeier, 2014). Attention to date has focussed on coreferential (anaphoric)
pronouns with NP antecedents, which when translated from English into a language
with grammatical gender, must agree with the translation of the head of
the antecedent. Despite growing attention to this problem, little progress has
been made, and little attention has been given to other pronouns.
The central claim of this thesis is that pronouns performing different functions
in text should be handled differently by SMT systems and when evaluating
pronoun translation. This motivates the introduction of a new framework to
categorise pronouns according to their function: Anaphoric/cataphoric reference,
event reference, extra-textual reference, pleonastic, addressee reference, speaker
reference, generic reference, or other function. Labelling pronouns according to
their function also helps to resolve instances of functional ambiguity arising from
the same pronoun in the source language having multiple functions, each with different
translation requirements in the target language. The categorisation framework
is used in corpus annotation, corpus analysis, SMT system development and
evaluation.
I have directed the annotation and conducted analyses of a parallel corpus of
English-German texts called ParCor (Guillou et al., 2014), in which pronouns
are manually annotated according to their function. This provides a first step
toward understanding the problems that SMT systems face when translating pronouns.
In the thesis, I show how analysis of manual translation can prove useful in
identifying and understanding systematic differences in pronoun use between two
languages and can help inform the design of SMT systems. In particular, the analysis
revealed that the German translations in ParCor contain more anaphoric and
pleonastic pronouns than their English originals, reflecting differences in pronoun
use. This raises a particular problem for the evaluation of pronoun translation.
Automatic evaluation methods that rely on reference translations to assess pronoun
translation, will not be able to provide an adequate evaluation when the
reference translation departs from the original source-language text. I also show
how analysis of the output of state-of-the-art SMT systems can reveal how well
current systems perform in translating different types of pronouns and indicate
where future efforts would be best directed. The analysis revealed that biases
in the training data, for example arising from the use of “it” and “es” as both
anaphoric and pleonastic pronouns in both English and German, is a problem
that SMT systems must overcome. SMT systems also need to disambiguate the
function of those pronouns with ambiguous surface forms so that each pronoun
may be translated in an appropriate way.
To demonstrate the value of this work, I have developed an automated post-editing
system in which automated tools are used to construct ParCor-style annotations
over the source-language pronouns. The annotations are then used to resolve
functional ambiguity for the pronoun “it” with separate rules applied to the
output of a baseline SMT system for anaphoric vs. non-anaphoric instances. The
system was submitted to the DiscoMT 2015 shared task on pronoun translation
for English-French. As with all other participating systems, the automatic post-editing
system failed to beat a simple phrase-based baseline. A detailed analysis,
including an oracle experiment in which manual annotation replaces the automated
tools, was conducted to discover the causes of poor system performance.
The analysis revealed that the design of the rules and their strict application to
the SMT output are the biggest factors in the failure of the system.
The lack of automatic evaluation metrics for pronoun translation is a limiting
factor in SMT system development. To alleviate this problem, Christian Hardmeier
and I have developed a testing regimen called PROTEST comprising (1)
a hand-selected set of pronoun tokens categorised according to the different problems
that SMT systems face and (2) an automated evaluation script. Pronoun
translations can then be automatically compared against a reference translation,
with mismatches referred for manual evaluation. The automatic evaluation was
applied to the output of systems submitted to the DiscoMT 2015 shared task
on pronoun translation. This again highlighted the weakness of the post-editing
system, which performs poorly due to its focus on producing gendered pronoun
translations, and its inability to distinguish between pleonastic and event reference
pronouns
Coherence in Machine Translation
Coherence ensures individual sentences work together to form a meaningful document. When properly translated, a coherent document in one language should result in a coherent document in another language. In Machine Translation, however, due to reasons of modeling and computational complexity, sentences are pieced together from words or phrases based on short context windows and
with no access to extra-sentential context.
In this thesis I propose ways to automatically assess the coherence of machine translation output. The work is structured around three dimensions: entity-based coherence, coherence as evidenced via syntactic patterns, and coherence as
evidenced via discourse relations.
For the first time, I evaluate existing monolingual coherence models on this new task, identifying issues and challenges that are specific to the machine translation setting. In order to address these issues, I adapted a state-of-the-art syntax
model, which also resulted in improved performance for the monolingual task. The results clearly indicate how much more difficult the new task is than the task of detecting shuffled texts. I proposed a new coherence model, exploring the crosslingual transfer of discourse relations in machine translation. This model is novel in that it measures the correctness of the discourse relation by comparison to the source text rather than to a reference translation. I identified patterns of incoherence common across different language pairs, and created a corpus of machine translated output annotated with coherence errors for evaluation purposes. I then examined
lexical coherence in a multilingual context, as a preliminary study for crosslingual transfer. Finally, I determine how the new and adapted models correlate with human judgements of translation quality and suggest that improvements in general evaluation within machine translation would benefit from having a coherence component that evaluated the translation output with respect to the source text
Modeling contextual information in neural machine translation
Machine translation has provided impressive translation quality for many language pairs. The improvements over the past few years are largely due to the introduction of neural networks to the field, resulting in the modern sequence-to-sequence neural machine translation models. NMT is at the core of many largescale industrial tools for automatic translation such as Google Translate, Microsoft Translator, Amazon Translate and many others.
Current NMT models work on the sentence-level, meaning they are used to translate individual sentences. However, for most practical use-cases, a user is interested in translating a document. In these cases, an MT tool splits a document into individual sentences and translates them independently. As a result, any dependencies between the sentences are ignored. This is likely to result in an incoherent document translation, mainly because of inconsistent translation of ambiguous source words or wrong translation of anaphoric pronouns. For example, it is undesirable to translate “bank” as a “financial bank” in one sentence and then later as a “river bank”. Furthermore, the translation of, e.g., the English third person pronoun “it” into German depends on the grammatical gender of the English antecedent’s German translation.
NMT has shown that it has impressive modeling capabilities, but is nevertheless unable to model discourse-level phenomena as it needs access to contextual information. In this work, we study discourse-level phenomena in context-aware NMT. To facilitate the particular studies of interest, we propose several models capable of incorporating contextual information into standard sentence-level NMT models. We direct our focus on several discourse phenomena, namely, coreference (anaphora) resolution, coherence and cohesion. We discuss these phenomena in terms of how well can they be modeled by context-aware NMT, how can we improve upon current state-of-the-art as well as the optimal granularity at which these phenomena should be modeled. We further investigate domain as a factor in context-aware NMT. Finally, we investigate existing challenge sets for anaphora resolution evaluation and provide a robust alternative.
We make the following contributions:
i) We study the importance of coreference (anaphora) resolution and coherence for context-aware NMT by making use of oracle information specific to these phenomena.
ii) We propose a method for improving performance on anaphora resolution based on curriculum learning which is inspired by the way humans organize learning.
iii) We investigate the use of contextual information for better handling of domain information, in particular in the case of modeling multiple domains at once and when applied to zero-resource domains.
iv) We present several context-aware models to enable us to examine the specific phenomena of interest we already mentioned.
v) We study the optimal way of modeling local and global context and present a model theoretically capable of using very large document context.
vi) We study the robustness of challenge sets for evaluation of anaphora resolution in MT by means of adversarial attacks and provide a template test set that robustly evaluates specific steps of an idealized coreference resolution pipeline for MT
Inducing Discourse Resources Using Annotation Projection
An important aspect of natural language understanding and generation involves the recognition and processing of discourse relations. Building applications such as text summarization, question answering and natural language generation needs human language technology beyond the level of the sentence. To address this need, large scale discourse annotated corpora such as the Penn Discourse Treebank (PDTB; Prasad et al., 2008a) have been developed.
Manually constructing discourse resources (e.g. discourse annotated corpora) is expensive, both in terms of time and expertise. As a consequence, such resources are only available for a few languages. In this thesis, we propose an approach that automatically creates two types of discourse resources from parallel texts: 1) PDTB-style discourse annotated corpora and 2) lexicons of discourse connectives. Our approach is based on annotation projection where linguistic annotations are projected from a source language to a target language in parallel texts.
Our work has made several theoretical contributions as well as practical contributions to the field of discourse analysis. From a theoretical perspective, we have proposed a method to refine the naive method of discourse annotation projection by filtering annotations that are not supported by parallel texts. Our approach is based on the intersection between statistical word-alignment models and can automatically identify 65% of unsupported projected annotations. We have also proposed a novel approach for annotation projection that is independent of statistical word-alignment models. This approach is more robust to longer discourse connectives than approaches based on statistical word-alignment models.
From a practical perspective, we have automatically created the Europarl ConcoDisco corpora from English-French parallel texts of the Europarl corpus (Koehn, 2009). In the Europarl ConcoDisco corpora, around 1 million occurrences of French discourse connectives are automatically aligned to their translation. From the French side of \parcorpus, we have extracted our first significant resource, the FrConcoDisco corpora. To our knowledge, the FrConcoDisco corpora are the first PDTB-style discourse annotated corpora for French where French discourse connectives are annotated with the discourse relations that they signaled. The FrConcoDisco corpora are significant in size as they contain more than 25 times more annotations than the PDTB. To evaluate the FrConcoDisco corpora, we showed how they can be used to train a classifier for the disambiguation of French discourse connectives with a high performance. The second significant resource that we automatically extracted from parallel texts is ConcoLeDisCo. ConcoLeDisCo is a lexicon of French discourse connectives mapped to PDTB discourse relations. While ConcoLeDisCo is useful by itself, as we showed in this thesis, it can be used to improve the coverage of manually constructed lexicons of discourse connectives such as LEXCONN (Roze et al., 2012)