Search CORE

28 research outputs found

Semi-Autoregressive Neural Machine Translation

Author: Chen Haiqing
Wang Chunqi
Zhang Ji
Publication venue
Publication date: 01/01/2018
Field of study

Existing approaches to neural machine translation are typically autoregressive models. While these models attain state-of-the-art translation quality, they are suffering from low parallelizability and thus slow at decoding long sequences. In this paper, we propose a novel model for fast sequence generation --- the semi-autoregressive Transformer (SAT). The SAT keeps the autoregressive property in global but relieves in local and thus is able to produce multiple successive words in parallel at each time step. Experiments conducted on English-German and Chinese-English translation tasks show that the SAT achieves a good balance between translation quality and decoding speed. On WMT'14 English-German translation, the SAT achieves 5.58

\times

speedup while maintains 88\% translation quality, significantly better than the previous non-autoregressive methods. When produces two words at each time step, the SAT is almost lossless (only 1\% degeneration in BLEU score).Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation

Author: Feng Yang
Meng Fandong
Shao Chenze
Zhang Jinchao
Zhou Jie
Publication venue
Publication date: 21/11/2019
Field of study

Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En

\leftrightarrow

De and about 2.5 BLEU scores on WMT16 En

\leftrightarrow

Ro.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications