2,764 research outputs found
Latent Normalizing Flows for Discrete Sequences
Normalizing flows are a powerful class of generative models for continuous
random variables, showing both strong model flexibility and the potential for
non-autoregressive generation. These benefits are also desired when modeling
discrete random variables such as text, but directly applying normalizing flows
to discrete sequences poses significant additional challenges. We propose a
VAE-based generative model which jointly learns a normalizing flow-based
distribution in the latent space and a stochastic mapping to an observed
discrete space. In this setting, we find that it is crucial for the flow-based
distribution to be highly multimodal. To capture this property, we propose
several normalizing flow architectures to maximize model flexibility.
Experiments consider common discrete sequence tasks of character-level language
modeling and polyphonic music generation. Our results indicate that an
autoregressive flow-based model can match the performance of a comparable
autoregressive baseline, and a non-autoregressive flow-based model can improve
generation speed with a penalty to performance
Discrete Flows: Invertible Generative Models of Discrete Data
While normalizing flows have led to significant advances in modeling
high-dimensional continuous distributions, their applicability to discrete
distributions remains unknown. In this paper, we show that flows can in fact be
extended to discrete events---and under a simple change-of-variables formula
not requiring log-determinant-Jacobian computations. Discrete flows have
numerous applications. We consider two flow architectures: discrete
autoregressive flows that enable bidirectionality, allowing, for example,
tokens in text to depend on both left-to-right and right-to-left contexts in an
exact language model; and discrete bipartite flows that enable efficient
non-autoregressive generation as in RealNVP. Empirically, we find that discrete
autoregressive flows outperform autoregressive baselines on synthetic discrete
distributions, an addition task, and Potts models; and bipartite flows can
obtain competitive performance with autoregressive baselines on character-level
language modeling for Penn Tree Bank and text8
Integer Discrete Flows and Lossless Compression
Lossless compression methods shorten the expected representation size of data
without loss of information, using a statistical model. Flow-based models are
attractive in this setting because they admit exact likelihood optimization,
which is equivalent to minimizing the expected number of bits per message.
However, conventional flows assume continuous data, which may lead to
reconstruction errors when quantized for compression. For that reason, we
introduce a flow-based generative model for ordinal discrete data called
Integer Discrete Flow (IDF): a bijective integer map that can learn rich
transformations on high-dimensional data. As building blocks for IDFs, we
introduce a flexible transformation layer called integer discrete coupling. Our
experiments show that IDFs are competitive with other flow-based generative
models. Furthermore, we demonstrate that IDF based compression achieves
state-of-the-art lossless compression rates on CIFAR10, ImageNet32, and
ImageNet64. To the best of our knowledge, this is the first lossless
compression method that uses invertible neural networks.Comment: Accepted as a conference paper at Neural Information Processing
Systems (NeurIPS) 201
ODEVAE: Deep generative second order ODEs with Bayesian neural networks
We present Ordinary Differential Equation Variational Auto-Encoder
(ODEVAE), a latent second order ODE model for high-dimensional sequential
data. Leveraging the advances in deep generative models, ODEVAE can
simultaneously learn the embedding of high dimensional trajectories and infer
arbitrarily complex continuous-time latent dynamics. Our model explicitly
decomposes the latent space into momentum and position components and solves a
second order ODE system, which is in contrast to recurrent neural network (RNN)
based time series models and recently proposed black-box ODE techniques. In
order to account for uncertainty, we propose probabilistic latent ODE dynamics
parameterized by deep Bayesian neural networks. We demonstrate our approach on
motion capture, image rotation and bouncing balls datasets. We achieve
state-of-the-art performance in long term motion prediction and imputation
tasks
MaCow: Masked Convolutional Generative Flow
Flow-based generative models, conceptually attractive due to tractability of
both the exact log-likelihood computation and latent-variable inference, and
efficiency of both training and sampling, has led to a number of impressive
empirical successes and spawned many advanced variants and theoretical
investigations. Despite their computational efficiency, the density estimation
performance of flow-based generative models significantly falls behind those of
state-of-the-art autoregressive models. In this work, we introduce masked
convolutional generative flow (MaCow), a simple yet effective architecture of
generative flow using masked convolution. By restricting the local connectivity
in a small kernel, MaCow enjoys the properties of fast and stable training, and
efficient sampling, while achieving significant improvements over Glow for
density estimation on standard image benchmarks, considerably narrowing the gap
to autoregressive models.Comment: In Proceedings of Thirty-third Conference on Neural Information
Processing Systems (NeurIPS-2019
Categorical Normalizing Flows via Continuous Transformations
Despite their popularity, to date, the application of normalizing flows on
categorical data stays limited. The current practice of using dequantization to
map discrete data to a continuous space is inapplicable as categorical data has
no intrinsic order. Instead, categorical data have complex and latent relations
that must be inferred, like the synonymy between words. In this paper, we
investigate \emph{Categorical Normalizing Flows}, that is normalizing flows for
categorical data. By casting the encoding of categorical data in continuous
space as a variational inference problem, we jointly optimize the continuous
representation and the model likelihood. Using a factorized decoder, we
introduce an inductive bias to model any interactions in the normalizing flow.
As a consequence, we do not only simplify the optimization compared to having a
joint decoder, but also make it possible to scale up to a large number of
categories that is currently impossible with discrete normalizing flows. Based
on Categorical Normalizing Flows, we propose GraphCNF a permutation-invariant
generative model on graphs. GraphCNF implements a three step approach modeling
the nodes, edges and adjacency matrix stepwise to increase efficiency. On
molecule generation, GraphCNF outperforms both one-shot and autoregressive
flow-based state-of-the-art.Comment: Submitted to: International Conference on Learning Representations
(ICLR), 202
A Tutorial on Deep Latent Variable Models of Natural Language
There has been much recent, exciting work on combining the complementary
strengths of latent variable models and deep learning. Latent variable modeling
makes it easy to explicitly specify model constraints through conditional
independence properties, while deep learning makes it possible to parameterize
these conditional likelihoods with powerful function approximators. While these
"deep latent variable" models provide a rich, flexible framework for modeling
many real-world phenomena, difficulties exist: deep parameterizations of
conditional likelihoods usually make posterior inference intractable, and
latent variable objectives often complicate backpropagation by introducing
points of non-differentiability. This tutorial explores these issues in depth
through the lens of variational inference.Comment: EMNLP 2018 Tutoria
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
Time series forecasting is often fundamental to scientific and engineering
problems and enables decision making. With ever increasing data set sizes, a
trivial solution to scale up predictions is to assume independence between
interacting time series. However, modeling statistical dependencies can improve
accuracy and enable analysis of interaction effects. Deep learning methods are
well suited for this problem, but multivariate models often assume a simple
parametric distribution and do not scale to high dimensions. In this work we
model the multivariate temporal dynamics of time series via an autoregressive
deep learning model, where the data distribution is represented by a
conditioned normalizing flow. This combination retains the power of
autoregressive models, such as good performance in extrapolation into the
future, with the flexibility of flows as a general purpose high-dimensional
distribution model, while remaining computationally tractable. We show that it
improves over the state-of-the-art for standard metrics on many real-world data
sets with several thousand interacting time-series
Probabilistic Video Generation using Holistic Attribute Control
Videos express highly structured spatio-temporal patterns of visual data. A
video can be thought of as being governed by two factors: (i) temporally
invariant (e.g., person identity), or slowly varying (e.g., activity),
attribute-induced appearance, encoding the persistent content of each frame,
and (ii) an inter-frame motion or scene dynamics (e.g., encoding evolution of
the person ex-ecuting the action). Based on this intuition, we propose a
generative framework for video generation and future prediction. The proposed
framework generates a video (short clip) by decoding samples sequentially drawn
from a latent space distribution into full video frames. Variational
Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from
the latent space and RNN as a wayto model the dynamics in the latent space. We
improve the video generation consistency through temporally-conditional
sampling and quality by structuring the latent space with attribute controls;
ensuring that attributes can be both inferred and conditioned on during
learning/generation. As a result, given attributes and/orthe first frame, our
model is able to generate diverse but highly consistent sets ofvideo sequences,
accounting for the inherent uncertainty in the prediction task. Experimental
results on Chair CAD, Weizmann Human Action, and MIT-Flickr datasets, along
with detailed comparison to the state-of-the-art, verify effectiveness of the
framework
Towards Recurrent Autoregressive Flow Models
Stochastic processes generated by non-stationary distributions are difficult
to represent with conventional models such as Gaussian processes. This work
presents Recurrent Autoregressive Flows as a method toward general stochastic
process modeling with normalizing flows. The proposed method defines a
conditional distribution for each variable in a sequential process by
conditioning the parameters of a normalizing flow with recurrent neural
connections. Complex conditional relationships are learned through the
recurrent network parameters. In this work, we present an initial design for a
recurrent flow cell and a method to train the model to match observed empirical
distributions. We demonstrate the effectiveness of this class of models through
a series of experiments in which models are trained on three complex stochastic
processes. We highlight the shortcomings of our current formulation and suggest
some potential solutions
- …