7,244 research outputs found
Fast Decoding in Sequence Models using Discrete Latent Variables
Autoregressive sequence models based on deep neural networks, such as RNNs,
Wavenet and the Transformer attain state-of-the-art results on many tasks.
However, they are difficult to parallelize and are thus slow at processing long
sequences. RNNs lack parallelism both during training and decoding, while
architectures like WaveNet and Transformer are much more parallelizable during
training, yet still operate sequentially during decoding.
Inspired by [arxiv:1711.00937], we present a method to extend sequence models
using discrete latent variables that makes decoding much more parallelizable.
We first auto-encode the target sequence into a shorter sequence of discrete
latent variables, which at inference time is generated autoregressively, and
finally decode the output sequence from this shorter latent sequence in
parallel. To this end, we introduce a novel method for constructing a sequence
of discrete latent variables and compare it with previously introduced methods.
Finally, we evaluate our model end-to-end on the task of neural machine
translation, where it is an order of magnitude faster at decoding than
comparable autoregressive models. While lower in BLEU than purely
autoregressive models, our model achieves higher scores than previously
proposed non-autoregressive translation models.Comment: ICML 201
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
We propose a conditional non-autoregressive neural sequence model based on
iterative refinement. The proposed model is designed based on the principles of
latent variable models and denoising autoencoders, and is generally applicable
to any sequence generation task. We extensively evaluate the proposed model on
machine translation (En-De and En-Ro) and image caption generation, and observe
that it significantly speeds up decoding while maintaining the generation
quality comparable to the autoregressive counterpart.Comment: Accepted to EMNLP'1
Discrete Flows: Invertible Generative Models of Discrete Data
While normalizing flows have led to significant advances in modeling
high-dimensional continuous distributions, their applicability to discrete
distributions remains unknown. In this paper, we show that flows can in fact be
extended to discrete events---and under a simple change-of-variables formula
not requiring log-determinant-Jacobian computations. Discrete flows have
numerous applications. We consider two flow architectures: discrete
autoregressive flows that enable bidirectionality, allowing, for example,
tokens in text to depend on both left-to-right and right-to-left contexts in an
exact language model; and discrete bipartite flows that enable efficient
non-autoregressive generation as in RealNVP. Empirically, we find that discrete
autoregressive flows outperform autoregressive baselines on synthetic discrete
distributions, an addition task, and Potts models; and bipartite flows can
obtain competitive performance with autoregressive baselines on character-level
language modeling for Penn Tree Bank and text8
Non-Autoregressive Machine Translation with Auxiliary Regularization
As a new neural machine translation approach, Non-Autoregressive machine
Translation (NAT) has attracted attention recently due to its high efficiency
in inference. However, the high efficiency has come at the cost of not
capturing the sequential dependency on the target side of translation, which
causes NAT to suffer from two kinds of translation errors: 1) repeated
translations (due to indistinguishable adjacent decoder hidden states), and 2)
incomplete translations (due to incomplete transfer of source side information
via the decoder hidden states).
In this paper, we propose to address these two problems by improving the
quality of decoder hidden representations via two auxiliary regularization
terms in the training process of an NAT model. First, to make the hidden states
more distinguishable, we regularize the similarity between consecutive hidden
states based on the corresponding target tokens. Second, to force the hidden
states to contain all the information in the source sentence, we leverage the
dual nature of translation tasks (e.g., English to German and German to
English) and minimize a backward reconstruction error to ensure that the hidden
states of the NAT decoder are able to recover the source side sentence.
Extensive experiments conducted on several benchmark datasets show that both
regularization strategies are effective and can alleviate the issues of
repeated translations and incomplete translations in NAT models. The accuracy
of NAT models is therefore improved significantly over the state-of-the-art NAT
models with even better efficiency for inference.Comment: AAAI 201
GLSR-VAE: Geodesic Latent Space Regularization for Variational AutoEncoder Architectures
VAEs (Variational AutoEncoders) have proved to be powerful in the context of
density modeling and have been used in a variety of contexts for creative
purposes. In many settings, the data we model possesses continuous attributes
that we would like to take into account at generation time. We propose in this
paper GLSR-VAE, a Geodesic Latent Space Regularization for the Variational
AutoEncoder architecture and its generalizations which allows a fine control on
the embedding of the data into the latent space. When augmenting the VAE loss
with this regularization, changes in the learned latent space reflects changes
of the attributes of the data. This deeper understanding of the VAE latent
space structure offers the possibility to modulate the attributes of the
generated data in a continuous way. We demonstrate its efficiency on a
monophonic music generation task where we manage to generate variations of
discrete sequences in an intended and playful way.Comment: 11 page
Neural Joint Source-Channel Coding
For reliable transmission across a noisy communication channel, classical
results from information theory show that it is asymptotically optimal to
separate out the source and channel coding processes. However, this
decomposition can fall short in the finite bit-length regime, as it requires
non-trivial tuning of hand-crafted codes and assumes infinite computational
power for decoding. In this work, we propose to jointly learn the encoding and
decoding processes using a new discrete variational autoencoder model. By
adding noise into the latent codes to simulate the channel during training, we
learn to both compress and error-correct given a fixed bit-length and
computational budget. We obtain codes that are not only competitive against
several separation schemes, but also learn useful robust representations of the
data for downstream tasks such as classification. Finally, inference
amortization yields an extremely fast neural decoder, almost an order of
magnitude faster compared to standard decoding methods based on iterative
belief propagation
Probabilistic Video Generation using Holistic Attribute Control
Videos express highly structured spatio-temporal patterns of visual data. A
video can be thought of as being governed by two factors: (i) temporally
invariant (e.g., person identity), or slowly varying (e.g., activity),
attribute-induced appearance, encoding the persistent content of each frame,
and (ii) an inter-frame motion or scene dynamics (e.g., encoding evolution of
the person ex-ecuting the action). Based on this intuition, we propose a
generative framework for video generation and future prediction. The proposed
framework generates a video (short clip) by decoding samples sequentially drawn
from a latent space distribution into full video frames. Variational
Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from
the latent space and RNN as a wayto model the dynamics in the latent space. We
improve the video generation consistency through temporally-conditional
sampling and quality by structuring the latent space with attribute controls;
ensuring that attributes can be both inferred and conditioned on during
learning/generation. As a result, given attributes and/orthe first frame, our
model is able to generate diverse but highly consistent sets ofvideo sequences,
accounting for the inherent uncertainty in the prediction task. Experimental
results on Chair CAD, Weizmann Human Action, and MIT-Flickr datasets, along
with detailed comparison to the state-of-the-art, verify effectiveness of the
framework
Compression with Flows via Local Bits-Back Coding
Likelihood-based generative models are the backbones of lossless compression
due to the guaranteed existence of codes with lengths close to negative log
likelihood. However, there is no guaranteed existence of computationally
efficient codes that achieve these lengths, and coding algorithms must be
hand-tailored to specific types of generative models to ensure computational
efficiency. Such coding algorithms are known for autoregressive models and
variational autoencoders, but not for general types of flow models. To fill in
this gap, we introduce local bits-back coding, a new compression technique for
flow models. We present efficient algorithms that instantiate our technique for
many popular types of flows, and we demonstrate that our algorithms closely
achieve theoretical codelengths for state-of-the-art flow models on
high-dimensional data.Comment: Published in NeurIPS 201
Recent Advances in Autoencoder-Based Representation Learning
Learning useful representations with little or no supervision is a key
challenge in artificial intelligence. We provide an in-depth review of recent
advances in representation learning with a focus on autoencoder-based models.
To organize these results we make use of meta-priors believed useful for
downstream tasks, such as disentanglement and hierarchical organization of
features. In particular, we uncover three main mechanisms to enforce such
properties, namely (i) regularizing the (approximate or aggregate) posterior
distribution, (ii) factorizing the encoding and decoding distribution, or (iii)
introducing a structured prior distribution. While there are some promising
results, implicit or explicit supervision remains a key enabler and all current
methods use strong inductive biases and modeling assumptions. Finally, we
provide an analysis of autoencoder-based representation learning through the
lens of rate-distortion theory and identify a clear tradeoff between the amount
of prior knowledge available about the downstream tasks, and how useful the
representation is for this task.Comment: Presented at the third workshop on Bayesian Deep Learning (NeurIPS
2018
Discrete Structural Planning for Neural Machine Translation
Structural planning is important for producing long sentences, which is a
missing part in current language generation models. In this work, we add a
planning phase in neural machine translation to control the coarse structure of
output sentences. The model first generates some planner codes, then predicts
real output words conditioned on them. The codes are learned to capture the
coarse structure of the target sentence. In order to obtain the codes, we
design an end-to-end neural network with a discretization bottleneck, which
predicts the simplified part-of-speech tags of target sentences. Experiments
show that the translation performance are generally improved by planning ahead.
We also find that translations with different structures can be obtained by
manipulating the planner codes
- …