Non-autoregressive approaches aim to improve the inference speed of
translation models, particularly those that generate output in a one-pass
forward manner. However, these approaches often suffer from a significant drop
in translation quality compared to autoregressive models. This paper introduces
a series of innovative techniques to enhance the translation quality of
Non-Autoregressive Translation (NAT) models while maintaining a substantial
acceleration in inference speed. We propose fine-tuning Pretrained Multilingual
Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Furthermore, we adopt the MASK insertion scheme for up-sampling instead of
token duplication, and we present an embedding distillation method to further
enhance performance. In our experiments, our model outperforms the baseline
autoregressive model (Transformer \textit{base}) on multiple datasets,
including WMT'14 DEβEN, WMT'16 ROβEN, and
IWSLT'14 DEβEN. Notably, our model achieves better performance
than the baseline autoregressive model on the IWSLT'14 EnβDe
and WMT'16 EnβRo datasets, even without using distillation data
during training. It is worth highlighting that on the IWSLT'14
DEβEN dataset, our model achieves an impressive BLEU score of
39.59, setting a new state-of-the-art performance. Additionally, our model
exhibits a remarkable speed improvement of 16.35 times compared to the
autoregressive model.Comment: 12 pages, 6 figure