Search CORE

6,110 research outputs found

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Author: Abolafia Daniel A.
Bahri Yasaman
Hron Jiri
Lee Jaehoon
Novak Roman
Pennington Jeffrey
Sohl-Dickstein Jascha
Xiao Lechao
Yang Greg
Publication venue
Publication date: 21/08/2020
Field of study

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance, beneficial in finite channel CNNs trained with stochastic gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.Comment: Published as a conference paper at ICLR 201

arXiv.org e-Print Archive

A Bayesian Perspective on the Deep Image Prior

Author: Cheng Zezhou
Gadelha Matheus
Maji Subhransu
Sheldon Daniel
Publication venue
Publication date: 16/04/2019
Field of study

The deep image prior was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For "inference", gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.Comment: CVPR 201

arXiv.org e-Print Archive

Deep convolutional Gaussian processes

Author: Blomqvist Kenneth
Heinonen Markus
Kaski Samuel
Publication venue
Publication date: 06/10/2018
Field of study

We propose deep convolutional Gaussian processes, a deep Gaussian process architecture with convolutional structure. The model is a principled Bayesian framework for detecting hierarchical combinations of local features for image classification. We demonstrate greatly improved image classification performance compared to current Gaussian process approaches on the MNIST and CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10 percentage points

arXiv.org e-Print Archive

Bayesian Image Classification with Deep Convolutional Gaussian Processes

Author: Artemev Artem
Dutordoir Vincent
Hensman James
van der Wilk Mark
Publication venue
Publication date: 04/03/2020
Field of study

In decision-making systems, it is important to have classifiers that have calibrated uncertainties, with an optimisation objective that can be used for automated model selection and training. Gaussian processes (GPs) provide uncertainty estimates and a marginal likelihood objective, but their weak inductive biases lead to inferior accuracy. This has limited their applicability in certain tasks (e.g. image classification). We propose a translation-insensitive convolutional kernel, which relaxes the translation invariance constraint imposed by previous convolutional GPs. We show how we can use the marginal likelihood to learn the degree of insensitivity. We also reformulate GP image-to-image convolutional mappings as multi-output GPs, leading to deep convolutional GPs. We show experimentally that our new kernel improves performance in both single-layer and deep models. We also demonstrate that our fully Bayesian approach improves on dropout-based Bayesian deep learning methods in terms of uncertainty and marginal likelihood estimates.Comment: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, PMLR: Volume 10

arXiv.org e-Print Archive

Quantifying Uncertainty in Discrete-Continuous and Skewed Data with Bayesian Deep Learning

Author: Dy Jennifer
Ganguly Auroop R.
Ganguly Sangram
Kodra Evan
Nemani Ramakrishna
Vandal Thomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/05/2018
Field of study

Deep Learning (DL) methods have been transforming computer vision with innovative adaptations to other domains including climate change. For DL to pervade Science and Engineering (S&E) applications where risk management is a core component, well-characterized uncertainty estimates must accompany predictions. However, S&E observations and model-simulations often follow heavily skewed distributions and are not well modeled with DL approaches, since they usually optimize a Gaussian, or Euclidean, likelihood loss. Recent developments in Bayesian Deep Learning (BDL), which attempts to capture uncertainties from noisy observations, aleatoric, and from unknown model parameters, epistemic, provide us a foundation. Here we present a discrete-continuous BDL model with Gaussian and lognormal likelihoods for uncertainty quantification (UQ). We demonstrate the approach by developing UQ estimates on `DeepSD', a super-resolution based DL model for Statistical Downscaling (SD) in climate applied to precipitation, which follows an extremely skewed distribution. We find that the discrete-continuous models outperform a basic Gaussian distribution in terms of predictive accuracy and uncertainty calibration. Furthermore, we find that the lognormal distribution, which can handle skewed distributions, produces quality uncertainty estimates at the extremes. Such results may be important across S&E, as well as other domains such as finance and economics, where extremes are often of significant interest. Furthermore, to our knowledge, this is the first UQ model in SD where both aleatoric and epistemic uncertainties are characterized.Comment: 10 Page

arXiv.org e-Print Archive

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

Author: Gal Yarin
Ghahramani Zoubin
Publication venue
Publication date: 18/01/2016
Field of study

Convolutional neural networks (CNNs) work well on large datasets. But labelled data is hard to collect, and in some applications larger amounts of data are not available. The problem then is how to use CNNs with small data -- as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches. This is by placing a probability distribution over the CNN's kernels. We approximate our model's intractable posterior with Bernoulli variational distributions, requiring no additional model parameters. On the theoretical side, we cast dropout network training as approximate inference in Bayesian neural networks. This allows us to implement our model using existing tools in deep learning with no increase in time complexity, while highlighting a negative result in the field. We show a considerable improvement in classification accuracy compared to standard techniques and improve on published state-of-the-art results for CIFAR-10.Comment: 12 pages, 3 figures, ICLR format, updated with reviewer comment

arXiv.org e-Print Archive

Scalable Training of Inference Networks for Gaussian-Process Models

Author: Khan Mohammad Emtiyaz
Shi Jiaxin
Zhu Jun
Publication venue
Publication date: 27/05/2019
Field of study

Inference in Gaussian process (GP) models is computationally challenging for large data, and often difficult to approximate with a small number of inducing points. We explore an alternative approximation that employs stochastic inference networks for a flexible inference. Unfortunately, for such networks, minibatch training is difficult to be able to learn meaningful correlations over function outputs for a large dataset. We propose an algorithm that enables such training by tracking a stochastic, functional mirror-descent algorithm. At each iteration, this only requires considering a finite number of input locations, resulting in a scalable and easy-to-implement algorithm. Empirical results show comparable and, sometimes, superior performance to existing sparse variational GP methods.Comment: ICML 2019. Update results added in the camera-ready versio

arXiv.org e-Print Archive

Infinitely deep neural networks as diffusion processes

Author: Favaro Stefano
Peluchetti Stefano
Publication venue
Publication date: 29/02/2020
Field of study

When the parameters are independently and identically distributed (initialized) neural networks exhibit undesirable properties that emerge as the number of layers increases, e.g. a vanishing dependency on the input and a concentration on restrictive families of functions including constant functions. We consider parameter distributions that shrink as the number of layers increases in order to recover well-behaved stochastic processes in the limit of infinite depth. This leads to set forth a link between infinitely deep residual networks and solutions to stochastic differential equations, i.e. diffusion processes. We show that these limiting processes do not suffer from the aforementioned issues and investigate their properties.Comment: 16 pages, 9 figure

arXiv.org e-Print Archive

Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data

Author: Koutsourelakis Phaedon-Stelios
Perdikaris Paris
Zabaras Nicholas
Zhu Yinhao
Publication venue: 'Elsevier BV'
Publication date: 18/01/2019
Field of study

Surrogate modeling and uncertainty quantification tasks for PDE systems are most often considered as supervised learning problems where input and output data pairs are used for training. The construction of such emulators is by definition a small data problem which poses challenges to deep learning approaches that have been developed to operate in the big data regime. Even in cases where such models have been shown to have good predictive capability in high dimensions, they fail to address constraints in the data implied by the PDE model. This paper provides a methodology that incorporates the governing equations of the physical model in the loss/likelihood functions. The resulting physics-constrained, deep learning models are trained without any labeled data (e.g. employing only input data) and provide comparable predictive responses with data-driven models while obeying the constraints of the problem at hand. This work employs a convolutional encoder-decoder neural network approach as well as a conditional flow-based generative model for the solution of PDEs, surrogate model construction, and uncertainty quantification tasks. The methodology is posed as a minimization problem of the reverse Kullback-Leibler (KL) divergence between the model predictive density and the reference conditional density, where the later is defined as the Boltzmann-Gibbs distribution at a given inverse temperature with the underlying potential relating to the PDE system of interest. The generalization capability of these models to out-of-distribution input is considered. Quantification and interpretation of the predictive uncertainty is provided for a number of problems.Comment: 51 pages, 18 figures, submitted to Journal of Computational Physic

arXiv.org e-Print Archive

Machine learning in acoustics: theory and applications

Author: Bianco Michael J.
Deledalle Charles-Alban
Gannot Sharon
Gerstoft Peter
Ozanich Emma
Roch Marie A.
Traer James
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/12/2019
Field of study

Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, which are often based in statistics, for automatically detecting and utilizing patterns in data. Relative to conventional acoustics and signal processing, ML is data-driven. Given sufficient training data, ML can discover complex relationships between features and desired labels or actions, or between features themselves. With large volumes of training data, ML can discover models describing complex acoustic phenomena such as human speech and reverberation. ML in acoustics is rapidly developing with compelling results and significant future promise. We first introduce ML, then highlight ML developments in four acoustics research areas: source localization in speech processing, source localization in ocean acoustics, bioacoustics, and environmental sounds in everyday scenes.Comment: Published with free access in Journal of the Acoustical Society of America, 27 Nov. 201

arXiv.org e-Print Archive