28 research outputs found
Semi-Autoregressive Neural Machine Translation
Existing approaches to neural machine translation are typically
autoregressive models. While these models attain state-of-the-art translation
quality, they are suffering from low parallelizability and thus slow at
decoding long sequences. In this paper, we propose a novel model for fast
sequence generation --- the semi-autoregressive Transformer (SAT). The SAT
keeps the autoregressive property in global but relieves in local and thus is
able to produce multiple successive words in parallel at each time step.
Experiments conducted on English-German and Chinese-English translation tasks
show that the SAT achieves a good balance between translation quality and
decoding speed. On WMT'14 English-German translation, the SAT achieves
5.58 speedup while maintains 88\% translation quality, significantly
better than the previous non-autoregressive methods. When produces two words at
each time step, the SAT is almost lossless (only 1\% degeneration in BLEU
score).Comment: EMNLP 201
Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation (NAT) achieves significant
decoding speedup through generating target words independently and
simultaneously. However, in the context of non-autoregressive translation, the
word-level cross-entropy loss cannot model the target-side sequential
dependency properly, leading to its weak correlation with the translation
quality. As a result, NAT tends to generate influent translations with
over-translation and under-translation errors. In this paper, we propose to
train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model
output and the reference sentence. The bag-of-ngrams training objective is
differentiable and can be efficiently calculated, which encourages NAT to
capture the target-side sequential dependency and correlates well with the
translation quality. We validate our approach on three translation tasks and
show that our approach largely outperforms the NAT baseline by about 5.0 BLEU
scores on WMT14 EnDe and about 2.5 BLEU scores on WMT16
EnRo.Comment: AAAI 202