92 research outputs found
Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information
Non-autoregressive neural machine translation (NAT) generates each target
word in parallel and has achieved promising inference acceleration. However,
existing NAT models still have a big gap in translation quality compared to
autoregressive neural machine translation models due to the enormous decoding
space. To address this problem, we propose a novel NAT framework named
ReorderNAT which explicitly models the reordering information in the decoding
procedure. We further introduce deterministic and non-deterministic decoding
strategies that utilize reordering information to narrow the decoding search
space in our proposed ReorderNAT. Experimental results on various widely-used
datasets show that our proposed model achieves better performance compared to
existing NAT models, and even achieves comparable translation quality as
autoregressive translation models with a significant speedup.Comment: Accepted by AAAI 202
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
Non-autoregressive translation (NAT) models are typically trained with the
cross-entropy loss, which forces the model outputs to be aligned verbatim with
the target sentence and will highly penalize small shifts in word positions.
Latent alignment models relax the explicit alignment by marginalizing out all
monotonic latent alignments with the CTC loss. However, they cannot handle
non-monotonic alignments, which is non-negligible as there is typically global
word reordering in machine translation. In this work, we explore non-monotonic
latent alignments for NAT. We extend the alignment space to non-monotonic
alignments to allow for the global word reordering and further consider all
alignments that overlap with the target sentence. We non-monotonically match
the alignments to the target sentence and train the latent alignment model to
maximize the F1 score of non-monotonic matching. Extensive experiments on major
WMT benchmarks show that our method substantially improves the translation
performance of CTC-based models. Our best model achieves 30.06 BLEU on WMT14
En-De with only one-iteration decoding, closing the gap between
non-autoregressive and autoregressive models.Comment: NeurIPS 202
CTC-based Non-autoregressive Speech Translation
Combining end-to-end speech translation (ST) and non-autoregressive (NAR)
generation is promising in language and speech processing for their advantages
of less error propagation and low latency. In this paper, we investigate the
potential of connectionist temporal classification (CTC) for non-autoregressive
speech translation (NAST). In particular, we develop a model consisting of two
encoders that are guided by CTC to predict the source and target texts,
respectively. Introducing CTC into NAST on both language sides has obvious
challenges: 1) the conditional independent generation somewhat breaks the
interdependency among tokens, and 2) the monotonic alignment assumption in
standard CTC does not hold in translation tasks. In response, we develop a
prediction-aware encoding approach and a cross-layer attention approach to
address these issues. We also use curriculum learning to improve convergence of
training. Experiments on the MuST-C ST benchmarks show that our NAST model
achieves an average BLEU score of 29.5 with a speed-up of 5.67, which
is comparable to the autoregressive counterpart and even outperforms the
previous best result of 0.9 BLEU points.Comment: ACL 2023 Main Conferenc
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
Non-autoregressive (NAR) generation, which is first proposed in neural
machine translation (NMT) to speed up inference, has attracted much attention
in both machine learning and natural language processing communities. While NAR
generation can significantly accelerate inference speed for machine
translation, the speedup comes at the cost of sacrificed translation accuracy
compared to its counterpart, auto-regressive (AR) generation. In recent years,
many new models and algorithms have been designed/proposed to bridge the
accuracy gap between NAR generation and AR generation. In this paper, we
conduct a systematic survey with comparisons and discussions of various
non-autoregressive translation (NAT) models from different aspects.
Specifically, we categorize the efforts of NAT into several groups, including
data manipulation, modeling methods, training criterion, decoding algorithms,
and the benefit from pre-trained models. Furthermore, we briefly review other
applications of NAR models beyond machine translation, such as dialogue
generation, text summarization, grammar error correction, semantic parsing,
speech synthesis, and automatic speech recognition. In addition, we also
discuss potential directions for future exploration, including releasing the
dependency of KD, dynamic length prediction, pre-training for NAR, and wider
applications, etc. We hope this survey can help researchers capture the latest
progress in NAR generation, inspire the design of advanced NAR models and
algorithms, and enable industry practitioners to choose appropriate solutions
for their applications. The web page of this survey is at
\url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.Comment: 25 pages, 11 figures, 4 table
Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation (NAT) achieves significant
decoding speedup through generating target words independently and
simultaneously. However, in the context of non-autoregressive translation, the
word-level cross-entropy loss cannot model the target-side sequential
dependency properly, leading to its weak correlation with the translation
quality. As a result, NAT tends to generate influent translations with
over-translation and under-translation errors. In this paper, we propose to
train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model
output and the reference sentence. The bag-of-ngrams training objective is
differentiable and can be efficiently calculated, which encourages NAT to
capture the target-side sequential dependency and correlates well with the
translation quality. We validate our approach on three translation tasks and
show that our approach largely outperforms the NAT baseline by about 5.0 BLEU
scores on WMT14 EnDe and about 2.5 BLEU scores on WMT16
EnRo.Comment: AAAI 202
- …