21,733 research outputs found
Improving Variational Inference with Inverse Autoregressive Flow
The framework of normalizing flows provides a general strategy for flexible
variational inference of posteriors over latent variables. We propose a new
type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast
to earlier published flows, scales well to high-dimensional latent spaces. The
proposed flow consists of a chain of invertible transformations, where each
transformation is based on an autoregressive neural network. In experiments, we
show that IAF significantly improves upon diagonal Gaussian approximate
posteriors. In addition, we demonstrate that a novel type of variational
autoencoder, coupled with IAF, is competitive with neural autoregressive models
in terms of attained log-likelihood on natural images, while allowing
significantly faster synthesis
Generative Model with Dynamic Linear Flow
Flow-based generative models are a family of exact log-likelihood models with
tractable sampling and latent-variable inference, hence conceptually attractive
for modeling complex distributions. However, flow-based models are limited by
density estimation performance issues as compared to state-of-the-art
autoregressive models. Autoregressive models, which also belong to the family
of likelihood-based methods, however suffer from limited parallelizability. In
this paper, we propose Dynamic Linear Flow (DLF), a new family of invertible
transformations with partially autoregressive structure. Our method benefits
from the efficient computation of flow-based methods and high density
estimation performance of autoregressive methods. We demonstrate that the
proposed DLF yields state-of-theart performance on ImageNet 32x32 and 64x64 out
of all flow-based methods, and is competitive with the best autoregressive
model. Additionally, our model converges 10 times faster than Glow (Kingma and
Dhariwal, 2018). The code is available at https://github.com/naturomics/DLF.Comment: 12 pages, 7 figure
Neural Autoregressive Flows
Normalizing flows and autoregressive models have been successfully combined
to produce state-of-the-art results in density estimation, via Masked
Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based
speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows
(IAF). We unify and generalize these approaches, replacing the (conditionally)
affine univariate transformations of MAF/IAF with a more general class of
invertible univariate transformations expressed as monotonic neural networks.
We demonstrate that the proposed neural autoregressive flows (NAF) are
universal approximators for continuous probability distributions, and their
greater expressivity allows them to better capture multimodal target
distributions. Experimentally, NAF yields state-of-the-art performance on a
suite of density estimation tasks and outperforms IAF in variational
autoencoders trained on binarized MNIST.Comment: 16 pages, 10 figures, 3 table
Discrete Flows: Invertible Generative Models of Discrete Data
While normalizing flows have led to significant advances in modeling
high-dimensional continuous distributions, their applicability to discrete
distributions remains unknown. In this paper, we show that flows can in fact be
extended to discrete events---and under a simple change-of-variables formula
not requiring log-determinant-Jacobian computations. Discrete flows have
numerous applications. We consider two flow architectures: discrete
autoregressive flows that enable bidirectionality, allowing, for example,
tokens in text to depend on both left-to-right and right-to-left contexts in an
exact language model; and discrete bipartite flows that enable efficient
non-autoregressive generation as in RealNVP. Empirically, we find that discrete
autoregressive flows outperform autoregressive baselines on synthetic discrete
distributions, an addition task, and Potts models; and bipartite flows can
obtain competitive performance with autoregressive baselines on character-level
language modeling for Penn Tree Bank and text8
Sum-of-Squares Polynomial Flow
Triangular map is a recent construct in probability theory that allows one to
transform any source probability density function to any target density
function. Based on triangular maps, we propose a general framework for
high-dimensional density estimation, by specifying one-dimensional
transformations (equivalently conditional densities) and appropriate
conditioner networks. This framework (a) reveals the commonalities and
differences of existing autoregressive and flow based methods, (b) allows a
unified understanding of the limitations and representation power of these
recent approaches and, (c) motivates us to uncover a new Sum-of-Squares (SOS)
flow that is interpretable, universal, and easy to train. We perform several
synthetic experiments on various density geometries to demonstrate the benefits
(and short-comings) of such transformations. SOS flows achieve competitive
results in simulations and several real-world datasets.Comment: 13 pages, ICML'201
MaCow: Masked Convolutional Generative Flow
Flow-based generative models, conceptually attractive due to tractability of
both the exact log-likelihood computation and latent-variable inference, and
efficiency of both training and sampling, has led to a number of impressive
empirical successes and spawned many advanced variants and theoretical
investigations. Despite their computational efficiency, the density estimation
performance of flow-based generative models significantly falls behind those of
state-of-the-art autoregressive models. In this work, we introduce masked
convolutional generative flow (MaCow), a simple yet effective architecture of
generative flow using masked convolution. By restricting the local connectivity
in a small kernel, MaCow enjoys the properties of fast and stable training, and
efficient sampling, while achieving significant improvements over Glow for
density estimation on standard image benchmarks, considerably narrowing the gap
to autoregressive models.Comment: In Proceedings of Thirty-third Conference on Neural Information
Processing Systems (NeurIPS-2019
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
The recently-developed WaveNet architecture is the current state of the art
in realistic speech synthesis, consistently rated as more natural sounding for
many different languages than any previous system. However, because WaveNet
relies on sequential generation of one audio sample at a time, it is poorly
suited to today's massively parallel computers, and therefore hard to deploy in
a real-time production setting. This paper introduces Probability Density
Distillation, a new method for training a parallel feed-forward network from a
trained WaveNet with no significant difference in quality. The resulting system
is capable of generating high-fidelity speech samples at more than 20 times
faster than real-time, and is deployed online by Google Assistant, including
serving multiple English and Japanese voices
Convolutional Normalizing Flows
Bayesian posterior inference is prevalent in various machine learning
problems. Variational inference provides one way to approximate the posterior
distribution, however its expressive power is limited and so is the accuracy of
resulting approximation. Recently, there has a trend of using neural networks
to approximate the variational posterior distribution due to the flexibility of
neural network architecture. One way to construct flexible variational
distribution is to warp a simple density into a complex by normalizing flows,
where the resulting density can be analytically evaluated. However, there is a
trade-off between the flexibility of normalizing flow and computation cost for
efficient transformation. In this paper, we propose a simple yet effective
architecture of normalizing flows, ConvFlow, based on convolution over the
dimensions of random input vector. Experiments on synthetic and real world
posterior inference problems demonstrate the effectiveness and efficiency of
the proposed method.Comment: ICML 2018 Workshop on Theoretical Foundations and Applications of
Deep Generative Model
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
Time series forecasting is often fundamental to scientific and engineering
problems and enables decision making. With ever increasing data set sizes, a
trivial solution to scale up predictions is to assume independence between
interacting time series. However, modeling statistical dependencies can improve
accuracy and enable analysis of interaction effects. Deep learning methods are
well suited for this problem, but multivariate models often assume a simple
parametric distribution and do not scale to high dimensions. In this work we
model the multivariate temporal dynamics of time series via an autoregressive
deep learning model, where the data distribution is represented by a
conditioned normalizing flow. This combination retains the power of
autoregressive models, such as good performance in extrapolation into the
future, with the flexibility of flows as a general purpose high-dimensional
distribution model, while remaining computationally tractable. We show that it
improves over the state-of-the-art for standard metrics on many real-world data
sets with several thousand interacting time-series
Multi-Attribute Selectivity Estimation Using Deep Learning
Selectivity estimation - the problem of estimating the result size of queries
- is a fundamental problem in databases. Accurate estimation of query
selectivity involving multiple correlated attributes is especially challenging.
Poor cardinality estimates could result in the selection of bad plans by the
query optimizer. We investigate the feasibility of using deep learning based
approaches for both point and range queries and propose two complementary
approaches. Our first approach considers selectivity as an unsupervised deep
density estimation problem. We successfully introduce techniques from neural
density estimation for this purpose. The key idea is to decompose the joint
distribution into a set of tractable conditional probability distributions such
that they satisfy the autoregressive property. Our second approach formulates
selectivity estimation as a supervised deep learning problem that predicts the
selectivity of a given query. We also introduce and address a number of
practical challenges arising when adapting deep learning for relational data.
These include query/data featurization, incorporating query workload
information in a deep learning framework and the dynamic scenario where both
data and workload queries could be updated. Our extensive experiments with a
special emphasis on queries with a large number of predicates and/or small
result sizes demonstrates that our proposed techniques provide fast and
accurate selective estimates with minimal space overhead
- …