597 research outputs found
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
Non-autoregressive (NAR) generation, which is first proposed in neural
machine translation (NMT) to speed up inference, has attracted much attention
in both machine learning and natural language processing communities. While NAR
generation can significantly accelerate inference speed for machine
translation, the speedup comes at the cost of sacrificed translation accuracy
compared to its counterpart, auto-regressive (AR) generation. In recent years,
many new models and algorithms have been designed/proposed to bridge the
accuracy gap between NAR generation and AR generation. In this paper, we
conduct a systematic survey with comparisons and discussions of various
non-autoregressive translation (NAT) models from different aspects.
Specifically, we categorize the efforts of NAT into several groups, including
data manipulation, modeling methods, training criterion, decoding algorithms,
and the benefit from pre-trained models. Furthermore, we briefly review other
applications of NAR models beyond machine translation, such as dialogue
generation, text summarization, grammar error correction, semantic parsing,
speech synthesis, and automatic speech recognition. In addition, we also
discuss potential directions for future exploration, including releasing the
dependency of KD, dynamic length prediction, pre-training for NAR, and wider
applications, etc. We hope this survey can help researchers capture the latest
progress in NAR generation, inspire the design of advanced NAR models and
algorithms, and enable industry practitioners to choose appropriate solutions
for their applications. The web page of this survey is at
\url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.Comment: 25 pages, 11 figures, 4 table
Non-autoregressive Machine Translation with Probabilistic Context-free Grammar
Non-autoregressive Transformer(NAT) significantly accelerates the inference
of neural machine translation. However, conventional NAT models suffer from
limited expression power and performance degradation compared to autoregressive
(AT) models due to the assumption of conditional independence among target
tokens. To address these limitations, we propose a novel approach called
PCFG-NAT, which leverages a specially designed Probabilistic Context-Free
Grammar (PCFG) to enhance the ability of NAT models to capture complex
dependencies among output tokens. Experimental results on major machine
translation benchmarks demonstrate that PCFG-NAT further narrows the gap in
translation quality between NAT and AT models. Moreover, PCFG-NAT facilitates a
deeper understanding of the generated sentences, addressing the lack of
satisfactory explainability in neural machine translation.Code is publicly
available at https://github.com/ictnlp/PCFG-NAT.Comment: NeurIPS 202
Fast Interleaved Bidirectional Sequence Generation
Independence assumptions during sequence generation can speed up inference, but parallel generation of highly inter-dependent tokens comes at a cost in quality. Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder by simply interleaving the two directions and adapting the word positions and selfattention masks. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer, and on five machine translation tasks and two document summarization tasks, achieves a decoding speedup of ~2x compared to autoregressive decoding with comparable quality. Notably, it outperforms left-to-right SA because the independence assumptions in IBDecoder are more felicitous. To achieve even higher speedups, we explore hybrid models where we either simultaneously predict multiple neighbouring tokens per direction, or perform multi-directional decoding by partitioning the target sequence. These methods achieve speedups to 4x–11x across different tasks at the cost of <1 BLEU or <0.5 ROUGE (on average)
ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting
In this paper, we introduce ProNet, an novel deep learning approach designed
for multi-horizon time series forecasting, adaptively blending autoregressive
(AR) and non-autoregressive (NAR) strategies. Our method involves dividing the
forecasting horizon into segments, predicting the most crucial steps in each
segment non-autoregressively, and the remaining steps autoregressively. The
segmentation process relies on latent variables, which effectively capture the
significance of individual time steps through variational inference. In
comparison to AR models, ProNet showcases remarkable advantages, requiring
fewer AR iterations, resulting in faster prediction speed, and mitigating error
accumulation. On the other hand, when compared to NAR models, ProNet takes into
account the interdependency of predictions in the output space, leading to
improved forecasting accuracy. Our comprehensive evaluation, encompassing four
large datasets, and an ablation study, demonstrate the effectiveness of ProNet,
highlighting its superior performance in terms of accuracy and prediction
speed, outperforming state-of-the-art AR and NAR forecasting models
CTC-based Non-autoregressive Speech Translation
Combining end-to-end speech translation (ST) and non-autoregressive (NAR)
generation is promising in language and speech processing for their advantages
of less error propagation and low latency. In this paper, we investigate the
potential of connectionist temporal classification (CTC) for non-autoregressive
speech translation (NAST). In particular, we develop a model consisting of two
encoders that are guided by CTC to predict the source and target texts,
respectively. Introducing CTC into NAST on both language sides has obvious
challenges: 1) the conditional independent generation somewhat breaks the
interdependency among tokens, and 2) the monotonic alignment assumption in
standard CTC does not hold in translation tasks. In response, we develop a
prediction-aware encoding approach and a cross-layer attention approach to
address these issues. We also use curriculum learning to improve convergence of
training. Experiments on the MuST-C ST benchmarks show that our NAST model
achieves an average BLEU score of 29.5 with a speed-up of 5.67, which
is comparable to the autoregressive counterpart and even outperforms the
previous best result of 0.9 BLEU points.Comment: ACL 2023 Main Conferenc
Deep generative modelling of the imaged human brain
Human-machine symbiosis is a very promising opportunity for the field of neurology given that the interpretation of the imaged human brain is a trivial feat
for neither entity. However, before machine learning systems can be used in
real world clinical situations, many issues with automated analysis must first be
solved. In this thesis I aim to address what I consider the three biggest hurdles
to the adoption of automated machine learning interpretative systems. For each
issue, I will first elucidate the reader on its importance given the overarching
narratives of both neurology and machine learning, and then showcase my proposed solutions to these issues through the use of deep generative models of the
imaged human brain.
First, I start by addressing what is an uncontroversial and universal sign of intelligence: the ability to extrapolate knowledge to unseen cases. Human neuroradiologists have studied the anatomy of the healthy brain and can therefore,
with some success, identify most pathologies present on an imaged brain, even
without having ever been previously exposed to them. Current discriminative
machine learning systems require vast amounts of labelled data in order to accurately identify diseases. In this first part I provide a generative framework that
permits machine learning models to more efficiently leverage unlabelled data for
better diagnoses with either none or small amounts of labels.
Secondly, I address a major ethical concern in medicine: equitable evaluation
of all patients, regardless of demographics or other identifying characteristics.
This is, unfortunately, something that even human practitioners fail at, making
the matter ever more pressing: unaddressed biases in data will become biases
in the models. To address this concern I suggest a framework through which
a generative model synthesises demographically counterfactual brain imaging
to successfully reduce the proliferation of demographic biases in discriminative
models.
Finally, I tackle the challenge of spatial anatomical inference, a task at the centre
of the field of lesion-deficit mapping, which given brain lesions and associated
cognitive deficits attempts to discover the true functional anatomy of the brain.
I provide a new Bayesian generative framework and implementation that allows
for greatly improved results on this challenge, hopefully, paving part of the road
towards a greater and more complete understanding of the human brain
- …