8,960 research outputs found
Regularizing Neural Machine Translation by Target-bidirectional Agreement
Although Neural Machine Translation (NMT) has achieved remarkable progress in
the past several years, most NMT systems still suffer from a fundamental
shortcoming as in other sequence generation tasks: errors made early in
generation process are fed as inputs to the model and can be quickly amplified,
harming subsequent sequence generation. To address this issue, we propose a
novel model regularization method for NMT training, which aims to improve the
agreement between translations generated by left-to-right (L2R) and
right-to-left (R2L) NMT decoders. This goal is achieved by introducing two
Kullback-Leibler divergence regularization terms into the NMT training
objective to reduce the mismatch between output probabilities of L2R and R2L
models. In addition, we also employ a joint training strategy to allow L2R and
R2L models to improve each other in an interactive update process. Experimental
results show that our proposed method significantly outperforms
state-of-the-art baselines on Chinese-English and English-German translation
tasks.Comment: Accepted by AAAI 201
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Encoder-decoder models provide a generic architecture for
sequence-to-sequence tasks such as speech recognition and translation. While
offline systems are often evaluated on quality metrics like word error rates
(WER) and BLEU, latency is also a crucial factor in many practical use-cases.
We propose three latency reduction techniques for chunk-based incremental
inference and evaluate their efficiency in terms of accuracy-latency trade-off.
On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by
sacrificing 1% WER (6% rel.) compared to offline transcription. Although our
experiments use the Transformer, the hypothesis selection strategies are
applicable to other encoder-decoder models. To avoid expensive re-computation,
we use a unidirectionally-attending encoder. After an adaptation procedure to
partial sequences, the unidirectional model performs on-par with the original
model. We further show that our approach is also applicable to low-latency
speech translation. On How2 English-Portuguese speech translation, we reduce
latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5%
rel.) compared to the offline system
- …