Search CORE

4,857 research outputs found

GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution

Author: Hernández-Lobato JM
Kusner MJ
Publication venue: arXiv.org
Publication date: 12/11/2016
Field of study

Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbel-softmax distribution, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. In this work, we evaluate the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements

arXiv.org e-Print Archive

UCL Discovery

Stochastic expectation propagation

Author: Hernández-Lobato JM
Li Y
Turner RE
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2015
Field of study

Expectation propagation (EP) is a deterministic approximation algorithm that is often used to perform approximate Bayesian parameter learning. EP approximates the full intractable posterior distribution through a set of local approximations that are iteratively refined for each datapoint. EP can offer analytic and computational advantages over other approximations, such as Variational Inference (VI), and is the method of choice for a number of models. The local nature of EP appears to make it an ideal candidate for performing Bayesian learning on large models in large-scale dataset settings. However, EP has a crucial limitation in this context: the number of approximating factors needs to increase with the number of data-points, N, which often entails a prohibitively large memory overhead. This paper presents an extension to EP, called stochastic expectation propagation (SEP), that maintains a global posterior approximation (like VI) but updates it in a local way (like EP). Experiments on a number of canonical learning problems using synthetic and real-world datasets indicate that SEP performs almost as well as full EP, but reduces the memory consumption by a factor of

N

. SEP is therefore ideally suited to performing approximate Bayesian learning in the large model, large dataset setting

arXiv.org e-Print Archive

Apollo (Cambridge)

Analysis of spatial spectral features of dynamic contrast-enhanced brain magnetic resonance images for studying small vessel disease

Author: AK Heye
C Happ
D Mattia
F Fazekas
F Khalifa
G Potter
J Staals
JM Wardlaw
JM Wardlaw
JM Wardlaw
JO Smith
MDC Valdés Hernández
MDC Valdés-Hernández
MDCV Hernández
MDCV Hernández
PS Naidu
S Muñoz Maniega
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2020
Field of study

Crossref

Edinburgh Research Explorer

Variational implicit processes

Author: Hernández-Lobato JM
Li Y
Ma C
Publication venue: 36th International Conference on Machine Learning, ICML 2019
Publication date: 01/01/2019
Field of study

We introduce the implicit processes (IPs), a stochastic process that places implicitly defined multivariate distributions over any finite collections of random variables. IPs are therefore highly flexible implicit priors over functions, with examples including data simulators, Bayesian neural networks and non-linear transformations of stochastic processes. A novel and efficient approximate inference algorithm for IPs, namely the variational implicit processes (VIPs), is derived using generalised wake-sleep updates. This method returns simple update equations and allows scalable hyper-parameter learning with stochastic optimization. Experiments show that VIPs return better uncertainty estimates and lower errors over existing inference methods for challenging models such as Bayesian neural networks, and Gaussian processes

arXiv.org e-Print Archive

Apollo (Cambridge)

Minimal random code learning: Getting bits back from compressed model parameters

Author: Havasi M
Hernández-Lobato JM
Peharz R
Publication venue: 7th International Conference on Learning Representations, ICLR 2019
Publication date: 30/09/2018
Field of study

While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an utmost goal in deep learning. A typical approach is to train a set of deterministic weights, while applying certain techniques such as pruning and quantization, in order that the empirical weight distribution becomes amenable to Shannon-style coding schemes. However, as shown in this paper, relaxing weight determinism and using a full variational distribution over weights allows for more efficient coding schemes and consequently higher compression rates. In particular, following the classical bits-back argument, we encode the network weights using a random sample, requiring only a number of bits corresponding to the Kullback-Leibler divergence between the sampled variational distribution and the encoding distribution. By imposing a constraint on the Kullback-Leibler divergence, we are able to explicitly control the compression rate, while optimizing the expected loss on the training set. The employed encoding scheme can be shown to be close to the optimal information-theoretical lower bound, with respect to the employed variational family. Our method sets new state-of-the-art in neural network compression, as it strictly dominates previous approaches in a Pareto sense: On the benchmarks LeNet-5/MNIST and VGG-16/CIFAR-10, our approach yields the best test performance for a fixed memory budget, and vice versa, it achieves the highest compression rates for a fixed test performance

arXiv.org e-Print Archive

Pure OAI Repository

Apollo (Cambridge)

Recommended from our members

Predictive Complexity Priors

Author: Gordon J
Hernández-Lobato JM
Nalisnick E
Publication venue: Proceedings of Machine Learning Research
Publication date: 01/01/2021
Field of study

Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference model. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning

Apollo (Cambridge)

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Recommended from our members

Black-Box α-divergence minimization

Author: Bui TD
Hernández-Lobato D
Hernández-Lobato JM
Li Y
Rowland M
Turner RE
Publication venue: Proceedings of the 33rd International Conference on Machine Learning
Publication date: 25/05/2016
Field of study

Black-box alpha (BB-α) is a new approximate inference method based on the minimization of α-divergences. BB-α scales to large datasets because it can be implemented using stochastic gradient descent. BB-α can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter α, the method is able to interpolate between variational Bayes (VB) (α → 0) and an algorithm similar to expectation propagation (EP) (α = 1). Experiments on probit regression and neural network regression and classification problems show that BB-a with non-standard settings of α, such as α = 0.5, usually produces better predictions than with α → 0 (VB) or α = 1 (EP).JMHL acknowledges support from the Rafael del Pino Foundation. YL thanks the Schlumberger Foundation Faculty for the Future fellowship on supporting her PhD study. MR acknowledges support from UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis. TDB thanks Google for funding his European Doctoral Fellowship. DHL acknowledge support from Plan National I+D+i, Grant TIN2013-42351-P and TIN2015- 70308-REDT, and from Comunidad de Madrid, Grant S2013/ICE-2845 CASI-CAM-CM. RET thanks EPSRC grant #EP/L000776/1 and #EP/M026957/1

Apollo (Cambridge)

Recommended from our members

Compressing images by encoding their latent representations with relative entropy coding

Author: Flamich G
Havasi M
Hernández-Lobato JM
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2020
Field of study

Variational Autoencoders (VAEs) have seen widespread use in learned image compression. They are used to learn expressive latent representations on which downstream compression methods can operate with high efficiency. Recently proposed 'bits-back' methods can indirectly encode the latent representation of images with codelength close to the relative entropy between the latent posterior and the prior. However, due to the underlying algorithm, these methods can only be used for lossless compression, and they only achieve their nominal efficiency when compressing multiple images simultaneously; they are inefficient for compressing single images. As an alternative, we propose a novel method, Relative Entropy Coding (REC), that can directly encode the latent representation with codelength close to the relative entropy for single images, supported by our empirical results obtained on the Cifar10, ImageNet32 and Kodak datasets. Moreover, unlike previous bits-back methods, REC is immediately applicable to lossy compression, where it is competitive with the state-of-the-art on the Kodak dataset

Apollo (Cambridge)

Recommended from our members

Depth uncertainty in neural networks

Author: Allingham JU
Antorán J
Hernández-Lobato JM
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2020
Field of study

Existing methods for estimating uncertainty in deep learning tend to require multiple forward passes, making them unsuitable for applications where computational resources are limited. To solve this, we perform probabilistic reasoning over the depth of neural networks. Different depths correspond to subnetworks which share weights and whose predictions are combined via marginalisation, yielding model uncertainty. By exploiting the sequential structure of feed-forward networks, we are able to both evaluate our training objective and make predictions with a single forward pass. We validate our approach on real-world regression and image classification tasks. Our approach provides uncertainty calibration, robustness to dataset shift, and accuracies competitive with more computationally expensive baselines

Apollo (Cambridge)