396 research outputs found
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search – the de facto standard inference algorithm in NMT – and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift
Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation
Zero-shot neural machine translation is an attractive goal because of the
high cost of obtaining data and building translation systems for new
translation directions. However, previous papers have reported mixed success in
zero-shot translation. It is hard to predict in which settings it will be
effective, and what limits performance compared to a fully supervised system.
In this paper, we investigate zero-shot performance of a multilingual
EN{FR,CS,DE,FI} system trained on WMT data. We find that
zero-shot performance is highly unstable and can vary by more than 6 BLEU
between training runs, making it difficult to reliably track improvements. We
observe a bias towards copying the source in zero-shot translation, and
investigate how the choice of subword segmentation affects this bias. We find
that language-specific subword segmentation results in less subword copying at
training time, and leads to better zero-shot performance compared to jointly
trained segmentation. A recent trend in multilingual models is to not train on
parallel data between all language pairs, but have a single bridge language,
e.g. English. We find that this negatively affects zero-shot translation and
leads to a failure mode where the model ignores the language tag and instead
produces English output in zero-shot directions. We show that this bias towards
English can be effectively reduced with even a small amount of parallel data in
some of the non-English pairs.Comment: Accepted at WMT 202
Domain Robustness in Neural Machine Translation
Translating text that diverges from the training domain is a key challenge
for machine translation. Domain robustness---the generalization of models to
unseen test domains---is low for both statistical (SMT) and neural machine
translation (NMT). In this paper, we study the performance of SMT and NMT
models on out-of-domain test sets. We find that in unknown domains, SMT and NMT
suffer from very different problems: SMT systems are mostly adequate but not
fluent, while NMT systems are mostly fluent, but not adequate. For NMT, we
identify such hallucinations (translations that are fluent but unrelated to the
source) as a key reason for low domain robustness. To mitigate this problem, we
empirically compare methods that are reported to improve adequacy or in-domain
robustness in terms of their effectiveness at improving domain robustness. In
experiments on German to English OPUS data, and German to Romansh (a
low-resource setting) we find that several methods improve domain robustness.
While those methods do lead to higher BLEU scores overall, they only slightly
increase the adequacy of translations compared to SMT.Comment: V2: AMTA camera-read
Atomic Quantum Simulation of Dynamical Gauge Fields coupled to Fermionic Matter: From String Breaking to Evolution after a Quench
Using a Fermi-Bose mixture of ultra-cold atoms in an optical lattice, we
construct a quantum simulator for a U(1) gauge theory coupled to fermionic
matter. The construction is based on quantum links which realize continuous
gauge symmetry with discrete quantum variables. At low energies, quantum link
models with staggered fermions emerge from a Hubbard-type model which can be
quantum simulated. This allows us to investigate string breaking as well as the
real-time evolution after a quench in gauge theories, which are inaccessible to
classical simulation methods.Comment: 14 pages, 5 figures. Main text plus one general supplementary
material and one basic introduction to the topic. Published versio
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Villiers André. Deux nouveaux Cerambycinae de l'Ile de la Réunion [Col. Cerambycidae]. In: Bulletin de la Société entomologique de France, volume 75 (3-4), Mars-avril 1970. pp. 81-84
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Recently, non-recurrent architectures (convolutional, self-attentional) have
outperformed RNNs in neural machine translation. CNNs and self-attentional
networks can connect distant words via shorter network paths than RNNs, and it
has been speculated that this improves their ability to model long-range
dependencies. However, this theoretical argument has not been tested
empirically, nor have alternative explanations for their strong performance
been explored in-depth. We hypothesize that the strong performance of CNNs and
self-attentional networks could also be due to their ability to extract
semantic features from the source text, and we evaluate RNNs, CNNs and
self-attention networks on two tasks: subject-verb agreement (where capturing
long-range dependencies is required) and word sense disambiguation (where
semantic feature extraction is required). Our experimental results show that:
1) self-attentional networks and CNNs do not outperform RNNs in modeling
subject-verb agreement over long distances; 2) self-attentional networks
perform distinctly better than RNNs and CNNs on word sense disambiguation.Comment: 11 pages, 5 figures, accepted by EMNLP 2018 (v2: corrected author
names; v3: fix to CNN context-window size, and new post-publication
experiments in section 6
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation
The translation of pronouns presents a special challenge to machine
translation to this day, since it often requires context outside the current
sentence. Recent work on models that have access to information across sentence
boundaries has seen only moderate improvements in terms of automatic evaluation
metrics such as BLEU. However, metrics that quantify the overall translation
quality are ill-equipped to measure gains from additional context. We argue
that a different kind of evaluation is needed to assess how well models
translate inter-sentential phenomena such as pronouns. This paper therefore
presents a test suite of contrastive translations focused specifically on the
translation of pronouns. Furthermore, we perform experiments with several
context-aware models. We show that, while gains in BLEU are moderate for those
systems, they outperform baselines by a large margin in terms of accuracy on
our contrastive test set. Our experiments also show the effectiveness of
parameter tying for multi-encoder architectures.Comment: Accepted at WMT 201
- …