396 research outputs found

    Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

    Get PDF
    Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search – the de facto standard inference algorithm in NMT – and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift

    Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

    Get PDF
    Zero-shot neural machine translation is an attractive goal because of the high cost of obtaining data and building translation systems for new translation directions. However, previous papers have reported mixed success in zero-shot translation. It is hard to predict in which settings it will be effective, and what limits performance compared to a fully supervised system. In this paper, we investigate zero-shot performance of a multilingual EN↔\leftrightarrow{FR,CS,DE,FI} system trained on WMT data. We find that zero-shot performance is highly unstable and can vary by more than 6 BLEU between training runs, making it difficult to reliably track improvements. We observe a bias towards copying the source in zero-shot translation, and investigate how the choice of subword segmentation affects this bias. We find that language-specific subword segmentation results in less subword copying at training time, and leads to better zero-shot performance compared to jointly trained segmentation. A recent trend in multilingual models is to not train on parallel data between all language pairs, but have a single bridge language, e.g. English. We find that this negatively affects zero-shot translation and leads to a failure mode where the model ignores the language tag and instead produces English output in zero-shot directions. We show that this bias towards English can be effectively reduced with even a small amount of parallel data in some of the non-English pairs.Comment: Accepted at WMT 202

    Domain Robustness in Neural Machine Translation

    Get PDF
    Translating text that diverges from the training domain is a key challenge for machine translation. Domain robustness---the generalization of models to unseen test domains---is low for both statistical (SMT) and neural machine translation (NMT). In this paper, we study the performance of SMT and NMT models on out-of-domain test sets. We find that in unknown domains, SMT and NMT suffer from very different problems: SMT systems are mostly adequate but not fluent, while NMT systems are mostly fluent, but not adequate. For NMT, we identify such hallucinations (translations that are fluent but unrelated to the source) as a key reason for low domain robustness. To mitigate this problem, we empirically compare methods that are reported to improve adequacy or in-domain robustness in terms of their effectiveness at improving domain robustness. In experiments on German to English OPUS data, and German to Romansh (a low-resource setting) we find that several methods improve domain robustness. While those methods do lead to higher BLEU scores overall, they only slightly increase the adequacy of translations compared to SMT.Comment: V2: AMTA camera-read

    Atomic Quantum Simulation of Dynamical Gauge Fields coupled to Fermionic Matter: From String Breaking to Evolution after a Quench

    Full text link
    Using a Fermi-Bose mixture of ultra-cold atoms in an optical lattice, we construct a quantum simulator for a U(1) gauge theory coupled to fermionic matter. The construction is based on quantum links which realize continuous gauge symmetry with discrete quantum variables. At low energies, quantum link models with staggered fermions emerge from a Hubbard-type model which can be quantum simulated. This allows us to investigate string breaking as well as the real-time evolution after a quench in gauge theories, which are inaccessible to classical simulation methods.Comment: 14 pages, 5 figures. Main text plus one general supplementary material and one basic introduction to the topic. Published versio

    The Word Sense Disambiguation Test Suite at WMT18

    Get PDF

    Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

    Get PDF
    Villiers André. Deux nouveaux Cerambycinae de l'Ile de la Réunion [Col. Cerambycidae]. In: Bulletin de la Société entomologique de France, volume 75 (3-4), Mars-avril 1970. pp. 81-84

    Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

    Get PDF
    Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.Comment: 11 pages, 5 figures, accepted by EMNLP 2018 (v2: corrected author names; v3: fix to CNN context-window size, and new post-publication experiments in section 6

    A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation

    Get PDF
    The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU. However, metrics that quantify the overall translation quality are ill-equipped to measure gains from additional context. We argue that a different kind of evaluation is needed to assess how well models translate inter-sentential phenomena such as pronouns. This paper therefore presents a test suite of contrastive translations focused specifically on the translation of pronouns. Furthermore, we perform experiments with several context-aware models. We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set. Our experiments also show the effectiveness of parameter tying for multi-encoder architectures.Comment: Accepted at WMT 201
    • …
    corecore