4,850 research outputs found
GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution
Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbel-softmax distribution, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. In this work, we evaluate the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements
Stochastic expectation propagation
Expectation propagation (EP) is a deterministic approximation algorithm that
is often used to perform approximate Bayesian parameter learning. EP
approximates the full intractable posterior distribution through a set of local
approximations that are iteratively refined for each datapoint. EP can offer
analytic and computational advantages over other approximations, such as
Variational Inference (VI), and is the method of choice for a number of models.
The local nature of EP appears to make it an ideal candidate for performing
Bayesian learning on large models in large-scale dataset settings. However, EP
has a crucial limitation in this context: the number of approximating factors
needs to increase with the number of data-points, N, which often entails a
prohibitively large memory overhead. This paper presents an extension to EP,
called stochastic expectation propagation (SEP), that maintains a global
posterior approximation (like VI) but updates it in a local way (like EP).
Experiments on a number of canonical learning problems using synthetic and
real-world datasets indicate that SEP performs almost as well as full EP, but
reduces the memory consumption by a factor of . SEP is therefore ideally
suited to performing approximate Bayesian learning in the large model, large
dataset setting
Deep Gaussian processes for regression using approximate expectation propagation
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations
of Gaussian processes (GPs) and are formally equivalent to neural networks with
multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic
models and as such are arguably more flexible, have a greater capacity to
generalise, and provide better calibrated uncertainty estimates than
alternative deep models. This paper develops a new approximate Bayesian
learning scheme that enables DGPs to be applied to a range of medium to large
scale regression problems for the first time. The new method uses an
approximate Expectation Propagation procedure and a novel and efficient
extension of the probabilistic backpropagation algorithm for learning. We
evaluate the new method for non-linear regression on eleven real-world
datasets, showing that it always outperforms GP regression and is almost always
better than state-of-the-art deterministic and sampling-based approximate
inference methods for Bayesian neural networks. As a by-product, this work
provides a comprehensive analysis of six approximate Bayesian methods for
training neural networks
Recommended from our members
Predictive Complexity Priors
Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference model. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning
Variational implicit processes
We introduce the implicit processes (IPs), a stochastic process that places
implicitly defined multivariate distributions over any finite collections of
random variables. IPs are therefore highly flexible implicit priors over
functions, with examples including data simulators, Bayesian neural networks
and non-linear transformations of stochastic processes. A novel and efficient
approximate inference algorithm for IPs, namely the variational implicit
processes (VIPs), is derived using generalised wake-sleep updates. This method
returns simple update equations and allows scalable hyper-parameter learning
with stochastic optimization. Experiments show that VIPs return better
uncertainty estimates and lower errors over existing inference methods for
challenging models such as Bayesian neural networks, and Gaussian processes
Minimal random code learning: Getting bits back from compressed model parameters
While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an utmost goal in deep learning. A typical approach is to train a set of deterministic weights, while applying certain techniques such as pruning and quantization, in order that the empirical weight distribution becomes amenable to Shannon-style coding schemes. However, as shown in this paper, relaxing weight determinism and using a full variational distribution over weights allows for more efficient coding schemes and consequently higher compression rates. In particular, following the classical bits-back argument, we encode the network weights using a random sample, requiring only a number of bits corresponding to the Kullback-Leibler divergence between the sampled variational distribution and the encoding distribution. By imposing a constraint on the Kullback-Leibler divergence, we are able to explicitly control the compression rate, while optimizing the expected loss on the training set. The employed encoding scheme can be shown to be close to the optimal information-theoretical lower bound, with respect to the employed variational family. Our method sets new state-of-the-art in neural network compression, as it strictly dominates previous approaches in a Pareto sense: On the benchmarks LeNet-5/MNIST and VGG-16/CIFAR-10, our approach yields the best test performance for a fixed memory budget, and vice versa, it achieves the highest compression rates for a fixed test performance
Recommended from our members
Compressing images by encoding their latent representations with relative entropy coding
Variational Autoencoders (VAEs) have seen widespread use in learned image
compression. They are used to learn expressive latent representations on which
downstream compression methods can operate with high efficiency. Recently
proposed 'bits-back' methods can indirectly encode the latent representation of
images with codelength close to the relative entropy between the latent
posterior and the prior. However, due to the underlying algorithm, these
methods can only be used for lossless compression, and they only achieve their
nominal efficiency when compressing multiple images simultaneously; they are
inefficient for compressing single images. As an alternative, we propose a
novel method, Relative Entropy Coding (REC), that can directly encode the
latent representation with codelength close to the relative entropy for single
images, supported by our empirical results obtained on the Cifar10, ImageNet32
and Kodak datasets. Moreover, unlike previous bits-back methods, REC is
immediately applicable to lossy compression, where it is competitive with the
state-of-the-art on the Kodak dataset
Recommended from our members
Depth uncertainty in neural networks
Existing methods for estimating uncertainty in deep learning tend to require multiple forward passes, making them unsuitable for applications where computational resources are limited. To solve this, we perform probabilistic reasoning over the depth of neural networks. Different depths correspond to subnetworks which share weights and whose predictions are combined via marginalisation, yielding model uncertainty. By exploiting the sequential structure of feed-forward networks, we are able to both evaluate our training objective and make predictions with a single forward pass. We validate our approach on real-world regression and image classification tasks. Our approach provides uncertainty calibration, robustness to dataset shift, and accuracies competitive with more computationally expensive baselines
Association among quantitative, chromosomal and enzymatic traits in a natural population of Drosophila melanogaster
International audienc
- …