3,327 research outputs found
An Effective Approach to Unsupervised Machine Translation
While machine translation has traditionally relied on large amounts of
parallel corpora, a recent research line has managed to train both Neural
Machine Translation (NMT) and Statistical Machine Translation (SMT) systems
using monolingual corpora only. In this paper, we identify and address several
deficiencies of existing unsupervised SMT approaches by exploiting subword
information, developing a theoretically well founded unsupervised tuning
method, and incorporating a joint refinement procedure. Moreover, we use our
improved SMT system to initialize a dual NMT model, which is further fine-tuned
through on-the-fly back-translation. Together, we obtain large improvements
over the previous state-of-the-art in unsupervised machine translation. For
instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points
more than the previous best unsupervised system, and 0.5 points more than the
(supervised) shared task winner back in 2014.Comment: ACL 201
Semi-Supervised Learning for Neural Machine Translation
While end-to-end neural machine translation (NMT) has made remarkable
progress recently, NMT systems only rely on parallel corpora for parameter
estimation. Since parallel corpora are usually limited in quantity, quality,
and coverage, especially for low-resource languages, it is appealing to exploit
monolingual corpora to improve NMT. We propose a semi-supervised approach for
training NMT models on the concatenation of labeled (parallel corpora) and
unlabeled (monolingual corpora) data. The central idea is to reconstruct the
monolingual corpora using an autoencoder, in which the source-to-target and
target-to-source translation models serve as the encoder and decoder,
respectively. Our approach can not only exploit the monolingual corpora of the
target language, but also of the source language. Experiments on the
Chinese-English dataset show that our approach achieves significant
improvements over state-of-the-art SMT and NMT systems.Comment: Corrected a typ
- …