14 research outputs found
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
Non-autoregressive translation (NAT) models are typically trained with the
cross-entropy loss, which forces the model outputs to be aligned verbatim with
the target sentence and will highly penalize small shifts in word positions.
Latent alignment models relax the explicit alignment by marginalizing out all
monotonic latent alignments with the CTC loss. However, they cannot handle
non-monotonic alignments, which is non-negligible as there is typically global
word reordering in machine translation. In this work, we explore non-monotonic
latent alignments for NAT. We extend the alignment space to non-monotonic
alignments to allow for the global word reordering and further consider all
alignments that overlap with the target sentence. We non-monotonically match
the alignments to the target sentence and train the latent alignment model to
maximize the F1 score of non-monotonic matching. Extensive experiments on major
WMT benchmarks show that our method substantially improves the translation
performance of CTC-based models. Our best model achieves 30.06 BLEU on WMT14
En-De with only one-iteration decoding, closing the gap between
non-autoregressive and autoregressive models.Comment: NeurIPS 202
Beyond MLE: Convex Learning for Text Generation
Maximum likelihood estimation (MLE) is a statistical method used to estimate
the parameters of a probability distribution that best explain the observed
data. In the context of text generation, MLE is often used to train generative
language models, which can then be used to generate new text. However, we argue
that MLE is not always necessary and optimal, especially for closed-ended text
generation tasks like machine translation. In these tasks, the goal of model is
to generate the most appropriate response, which does not necessarily require
it to estimate the entire data distribution with MLE. To this end, we propose a
novel class of training objectives based on convex functions, which enables
text generation models to focus on highly probable outputs without having to
estimate the entire data distribution. We investigate the theoretical
properties of the optimal predicted distribution when applying convex functions
to the loss, demonstrating that convex functions can sharpen the optimal
distribution, thereby enabling the model to better capture outputs with high
probabilities. Experiments on various text generation tasks and models show the
effectiveness of our approach. It enables autoregressive models to bridge the
gap between greedy and beam search, and facilitates the learning of
non-autoregressive models with a maximum improvement of 9+ BLEU points.
Moreover, our approach also exhibits significant impact on large language
models (LLMs), substantially enhancing their generative capability on various
tasks. Source code is available at
\url{https://github.com/ictnlp/Convex-Learning}.Comment: NeurIPS 202
Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation (NAT) achieves significant
decoding speedup through generating target words independently and
simultaneously. However, in the context of non-autoregressive translation, the
word-level cross-entropy loss cannot model the target-side sequential
dependency properly, leading to its weak correlation with the translation
quality. As a result, NAT tends to generate influent translations with
over-translation and under-translation errors. In this paper, we propose to
train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model
output and the reference sentence. The bag-of-ngrams training objective is
differentiable and can be efficiently calculated, which encourages NAT to
capture the target-side sequential dependency and correlates well with the
translation quality. We validate our approach on three translation tasks and
show that our approach largely outperforms the NAT baseline by about 5.0 BLEU
scores on WMT14 EnDe and about 2.5 BLEU scores on WMT16
EnRo.Comment: AAAI 202
Non-autoregressive Machine Translation with Probabilistic Context-free Grammar
Non-autoregressive Transformer(NAT) significantly accelerates the inference
of neural machine translation. However, conventional NAT models suffer from
limited expression power and performance degradation compared to autoregressive
(AT) models due to the assumption of conditional independence among target
tokens. To address these limitations, we propose a novel approach called
PCFG-NAT, which leverages a specially designed Probabilistic Context-Free
Grammar (PCFG) to enhance the ability of NAT models to capture complex
dependencies among output tokens. Experimental results on major machine
translation benchmarks demonstrate that PCFG-NAT further narrows the gap in
translation quality between NAT and AT models. Moreover, PCFG-NAT facilitates a
deeper understanding of the generated sentences, addressing the lack of
satisfactory explainability in neural machine translation.Code is publicly
available at https://github.com/ictnlp/PCFG-NAT.Comment: NeurIPS 202
Non-autoregressive Streaming Transformer for Simultaneous Translation
Simultaneous machine translation (SiMT) models are trained to strike a
balance between latency and translation quality. However, training these models
to achieve high quality while maintaining low latency often leads to a tendency
for aggressive anticipation. We argue that such issue stems from the
autoregressive architecture upon which most existing SiMT models are built. To
address those issues, we propose non-autoregressive streaming Transformer
(NAST) which comprises a unidirectional encoder and a non-autoregressive
decoder with intra-chunk parallelism. We enable NAST to generate the blank
token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and
train it to maximize the non-monotonic latent alignment with an alignment-based
latency loss. Experiments on various SiMT benchmarks demonstrate that NAST
outperforms previous strong autoregressive SiMT baselines.Comment: EMNLP 2023 main conference; Source code is available at
https://github.com/ictnlp/NAS
Rephrasing the Reference for Non-autoregressive Machine Translation
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence, so the reference sentence may be inappropriate for the training when the NAT output is closer to other translations. In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. As we train NAT based on the rephraser output rather than the reference sentence, the rephraser output should fit well with the NAT output and not deviate too far from the reference, which can be quantified as reward functions and optimized by reinforcement learning. Experiments on major WMT benchmarks and NAT baselines show that our approach consistently improves the translation quality of NAT. Specifically, our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference