184 research outputs found

    Pronoun Translation and Prediction with or without Coreference Links

    Get PDF
    The Idiap NLP Group has participated in both DiscoMT 2015 sub-tasks: pronoun-focused translation and pronoun prediction. The system for the first sub-task combines two knowledge sources: gram matical constraints from the hypothesized coreference links, and candidate translations from an SMT decoder. The system for the second sub-task avoids hypothesizing a coreference link, and uses instead a large set of source-side and target-side features from the noun phrases surrounding the pronoun to train a pronoun predictor

    Modeling contextual information in neural machine translation

    Get PDF
    Machine translation has provided impressive translation quality for many language pairs. The improvements over the past few years are largely due to the introduction of neural networks to the field, resulting in the modern sequence-to-sequence neural machine translation models. NMT is at the core of many largescale industrial tools for automatic translation such as Google Translate, Microsoft Translator, Amazon Translate and many others. Current NMT models work on the sentence-level, meaning they are used to translate individual sentences. However, for most practical use-cases, a user is interested in translating a document. In these cases, an MT tool splits a document into individual sentences and translates them independently. As a result, any dependencies between the sentences are ignored. This is likely to result in an incoherent document translation, mainly because of inconsistent translation of ambiguous source words or wrong translation of anaphoric pronouns. For example, it is undesirable to translate “bank” as a “financial bank” in one sentence and then later as a “river bank”. Furthermore, the translation of, e.g., the English third person pronoun “it” into German depends on the grammatical gender of the English antecedent’s German translation. NMT has shown that it has impressive modeling capabilities, but is nevertheless unable to model discourse-level phenomena as it needs access to contextual information. In this work, we study discourse-level phenomena in context-aware NMT. To facilitate the particular studies of interest, we propose several models capable of incorporating contextual information into standard sentence-level NMT models. We direct our focus on several discourse phenomena, namely, coreference (anaphora) resolution, coherence and cohesion. We discuss these phenomena in terms of how well can they be modeled by context-aware NMT, how can we improve upon current state-of-the-art as well as the optimal granularity at which these phenomena should be modeled. We further investigate domain as a factor in context-aware NMT. Finally, we investigate existing challenge sets for anaphora resolution evaluation and provide a robust alternative. We make the following contributions: i) We study the importance of coreference (anaphora) resolution and coherence for context-aware NMT by making use of oracle information specific to these phenomena. ii) We propose a method for improving performance on anaphora resolution based on curriculum learning which is inspired by the way humans organize learning. iii) We investigate the use of contextual information for better handling of domain information, in particular in the case of modeling multiple domains at once and when applied to zero-resource domains. iv) We present several context-aware models to enable us to examine the specific phenomena of interest we already mentioned. v) We study the optimal way of modeling local and global context and present a model theoretically capable of using very large document context. vi) We study the robustness of challenge sets for evaluation of anaphora resolution in MT by means of adversarial attacks and provide a template test set that robustly evaluates specific steps of an idealized coreference resolution pipeline for MT

    Evaluating and improving lexical language understanding in neural machine translation

    Get PDF
    Lexical understanding is an inalienable component of the translation process. In order to correctly map the meaning of a linguistic unit to the appropriate target language expression, the meaning of its constituent words has first to be identified and disambiguated, followed by the application of compositional operations. This thesis examines the competency of contemporary neural machine translation (NMT) models on two core aspects of lexical understanding – word sense disambiguation (WSD) and coreference resolution (CoR), both of which are well-established and much-studied natural language processing (NLP) tasks. Certain linguistic properties that are under-specified in a source language (e.g. the grammatical gender of a noun in English) may need to be stated explicitly in the chosen target language (e.g. German). Doing so correctly requires the accurate resolution of the associated ambiguities. While recent modeling advances appear to suggest that both WSD and CoR are largely solved challenges in machine translation, the work conducted within the scope of this thesis demonstrates that this is not yet the case. In particular, we show that NMT systems are prone to relying on surface-level heuristics and data biases to guide their lexical disambiguation decisions, rather than engaging in deep language understanding by correctly recognizing and leveraging contextual disambiguation triggers. As part of our investigation, we introduce a novel methodology for predicting WSD errors a translation model is likely to make and utilize this knowledge to craft adversarial attacks with the aim to elicit disambiguation errors in model translations. Additionally, we create a set of challenging CoR benchmarks that uncover the inability of translation systems to identify referents of pronouns in contexts that presuppose commonsense reasoning, caused by their pathological over-reliance on data biases. At the same time, we develop initial solutions for the identified model deficiencies. As such, we show that fine-tuning on de-biased data and modifying the learning objective of a model can significantly improve disambiguation performance by counteracting the harmful impact of data biases. We furthermore propose a novel extension to the popular transformer architecture that is found to strengthen its WSD capabilities and robustness to adversarial WSD attacks by facilitating the accessibility of lexical features across all layers of the model and increasing the extent to which contextual information is encapsulated with its latent representations. Despite the so effected improvements to WSD and CoR, both tasks remain far from solved, posing a veritable challenge for the current generation of NMT models, as well as for large language models that have risen to prominence within NLP in recent years
    • 

    corecore