19 research outputs found
Meta-Learning for Low-Resource Neural Machine Translation
In this paper, we propose to extend the recently introduced model-agnostic
meta-learning algorithm (MAML) for low-resource neural machine translation
(NMT). We frame low-resource translation as a meta-learning problem, and we
learn to adapt to low-resource languages based on multilingual high-resource
language tasks. We use the universal lexical
representation~\citep{gu2018universal} to overcome the input-output mismatch
across different languages. We evaluate the proposed meta-learning strategy
using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt,
Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro,
Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach
significantly outperforms the multilingual, transfer learning based
approach~\citep{zoph2016transfer} and enables us to train a competitive NMT
system with only a fraction of training examples. For instance, the proposed
approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing
only 16,000 translated words (~600 parallel sentences).Comment: Accepted as a full paper at EMNLP 201
Improving Zero-shot Translation with Language-Independent Constraints
An important concern in training multilingual neural machine translation
(NMT) is to translate between language pairs unseen during training, i.e
zero-shot translation. Improving this ability kills two birds with one stone by
providing an alternative to pivot translation which also allows us to better
understand how the model captures information between languages.
In this work, we carried out an investigation on this capability of the
multilingual NMT models. First, we intentionally create an encoder architecture
which is independent with respect to the source language. Such experiments shed
light on the ability of NMT encoders to learn multilingual representations, in
general. Based on such proof of concept, we were able to design regularization
methods into the standard Transformer model, so that the whole architecture
becomes more robust in zero-shot conditions. We investigated the behaviour of
such models on the standard IWSLT 2017 multilingual dataset. We achieved an
average improvement of 2.23 BLEU points across 12 language pairs compared to
the zero-shot performance of a state-of-the-art multilingual system.
Additionally, we carry out further experiments in which the effect is confirmed
even for language pairs with multiple intermediate pivots.Comment: 10 pages version accepted in WMT 201