23 research outputs found
Domain Robustness in Neural Machine Translation
Translating text that diverges from the training domain is a key challenge
for machine translation. Domain robustness---the generalization of models to
unseen test domains---is low for both statistical (SMT) and neural machine
translation (NMT). In this paper, we study the performance of SMT and NMT
models on out-of-domain test sets. We find that in unknown domains, SMT and NMT
suffer from very different problems: SMT systems are mostly adequate but not
fluent, while NMT systems are mostly fluent, but not adequate. For NMT, we
identify such hallucinations (translations that are fluent but unrelated to the
source) as a key reason for low domain robustness. To mitigate this problem, we
empirically compare methods that are reported to improve adequacy or in-domain
robustness in terms of their effectiveness at improving domain robustness. In
experiments on German to English OPUS data, and German to Romansh (a
low-resource setting) we find that several methods improve domain robustness.
While those methods do lead to higher BLEU scores overall, they only slightly
increase the adequacy of translations compared to SMT.Comment: V2: AMTA camera-read
The University of Edinburgh’s Neural MT Systems for WMT17
This paper describes the University of Edinburgh's submissions to the WMT17
shared news translation and biomedical translation tasks. We participated in 12
translation directions for news, translating between English and Czech, German,
Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted
systems for English to Czech, German, Polish and Romanian. Our systems are
neural machine translation systems trained with Nematus, an attentional
encoder-decoder. We follow our setup from last year and build BPE-based models
with parallel and back-translated monolingual training data. Novelties this
year include the use of deep architectures, layer normalization, and more
compact models due to weight tying and improvements in BPE segmentations. We
perform extensive ablative experiments, reporting on the effectivenes of layer
normalization, deep architectures, and different ensembling techniques.Comment: WMT 2017 shared task track; for Bibtex, see
http://homepages.inf.ed.ac.uk/rsennric/bib.html#uedin-nmt:201
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference
Reinforcement learning from human feedback (RLHF) is a recent technique to
improve the quality of the text generated by a language model, making it closer
to what humans would generate. A core ingredient in RLHF's success in aligning
and improving large language models (LLMs) is its reward model, trained using
human feedback on model outputs. In machine translation (MT), where metrics
trained from human annotations can readily be used as reward models, recent
methods using minimum Bayes risk decoding and reranking have succeeded in
improving the final quality of translation. In this study, we comprehensively
explore and compare techniques for integrating quality metrics as reward models
into the MT pipeline. This includes using the reward model for data filtering,
during the training phase through RL, and at inference time by employing
reranking techniques, and we assess the effects of combining these in a unified
approach. Our experimental results, conducted across multiple translation
tasks, underscore the crucial role of effective data filtering, based on
estimated quality, in harnessing the full potential of RL in enhancing MT
quality. Furthermore, our findings demonstrate the effectiveness of combining
RL training with reranking techniques, showcasing substantial improvements in
translation quality.Comment: 14 pages, work-in-progres