6,110 research outputs found
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
There is a previously identified equivalence between wide fully connected
neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables,
for instance, test set predictions that would have resulted from a fully
Bayesian, infinitely wide trained FCN to be computed without ever instantiating
the FCN, but by instead evaluating the corresponding GP. In this work, we
derive an analogous equivalence for multi-layer convolutional neural networks
(CNNs) both with and without pooling layers, and achieve state of the art
results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte
Carlo method to estimate the GP corresponding to a given neural network
architecture, even in cases where the analytic form has too many terms to be
computationally feasible.
Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs
with and without weight sharing are identical. As a consequence, translation
equivariance, beneficial in finite channel CNNs trained with stochastic
gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment
of the infinite channel limit - a qualitative difference between the two
regimes that is not present in the FCN case. We confirm experimentally, that
while in some scenarios the performance of SGD-trained finite CNNs approaches
that of the corresponding GPs as the channel count increases, with careful
tuning SGD-trained CNNs can significantly outperform their corresponding GPs,
suggesting advantages from SGD training compared to fully Bayesian parameter
estimation.Comment: Published as a conference paper at ICLR 201
A Bayesian Perspective on the Deep Image Prior
The deep image prior was recently introduced as a prior for natural images.
It represents images as the output of a convolutional network with random
inputs. For "inference", gradient descent is performed to adjust network
parameters to make the output match observations. This approach yields good
performance on a range of image reconstruction tasks. We show that the deep
image prior is asymptotically equivalent to a stationary Gaussian process prior
in the limit as the number of channels in each layer of the network goes to
infinity, and derive the corresponding kernel. This informs a Bayesian approach
to inference. We show that by conducting posterior inference using stochastic
gradient Langevin we avoid the need for early stopping, which is a drawback of
the current approach, and improve results for denoising and impainting tasks.
We illustrate these intuitions on a number of 1D and 2D signal reconstruction
tasks.Comment: CVPR 201
Deep convolutional Gaussian processes
We propose deep convolutional Gaussian processes, a deep Gaussian process
architecture with convolutional structure. The model is a principled Bayesian
framework for detecting hierarchical combinations of local features for image
classification. We demonstrate greatly improved image classification
performance compared to current Gaussian process approaches on the MNIST and
CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10
percentage points
Bayesian Image Classification with Deep Convolutional Gaussian Processes
In decision-making systems, it is important to have classifiers that have
calibrated uncertainties, with an optimisation objective that can be used for
automated model selection and training. Gaussian processes (GPs) provide
uncertainty estimates and a marginal likelihood objective, but their weak
inductive biases lead to inferior accuracy. This has limited their
applicability in certain tasks (e.g. image classification). We propose a
translation-insensitive convolutional kernel, which relaxes the translation
invariance constraint imposed by previous convolutional GPs. We show how we can
use the marginal likelihood to learn the degree of insensitivity. We also
reformulate GP image-to-image convolutional mappings as multi-output GPs,
leading to deep convolutional GPs. We show experimentally that our new kernel
improves performance in both single-layer and deep models. We also demonstrate
that our fully Bayesian approach improves on dropout-based Bayesian deep
learning methods in terms of uncertainty and marginal likelihood estimates.Comment: Proceedings of the 23rd International Conference on Artificial
Intelligence and Statistics (AISTATS) 2020, PMLR: Volume 10
Quantifying Uncertainty in Discrete-Continuous and Skewed Data with Bayesian Deep Learning
Deep Learning (DL) methods have been transforming computer vision with
innovative adaptations to other domains including climate change. For DL to
pervade Science and Engineering (S&E) applications where risk management is a
core component, well-characterized uncertainty estimates must accompany
predictions. However, S&E observations and model-simulations often follow
heavily skewed distributions and are not well modeled with DL approaches, since
they usually optimize a Gaussian, or Euclidean, likelihood loss. Recent
developments in Bayesian Deep Learning (BDL), which attempts to capture
uncertainties from noisy observations, aleatoric, and from unknown model
parameters, epistemic, provide us a foundation. Here we present a
discrete-continuous BDL model with Gaussian and lognormal likelihoods for
uncertainty quantification (UQ). We demonstrate the approach by developing UQ
estimates on `DeepSD', a super-resolution based DL model for Statistical
Downscaling (SD) in climate applied to precipitation, which follows an
extremely skewed distribution. We find that the discrete-continuous models
outperform a basic Gaussian distribution in terms of predictive accuracy and
uncertainty calibration. Furthermore, we find that the lognormal distribution,
which can handle skewed distributions, produces quality uncertainty estimates
at the extremes. Such results may be important across S&E, as well as other
domains such as finance and economics, where extremes are often of significant
interest. Furthermore, to our knowledge, this is the first UQ model in SD where
both aleatoric and epistemic uncertainties are characterized.Comment: 10 Page
Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
Convolutional neural networks (CNNs) work well on large datasets. But
labelled data is hard to collect, and in some applications larger amounts of
data are not available. The problem then is how to use CNNs with small data --
as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better
robustness to over-fitting on small data than traditional approaches. This is
by placing a probability distribution over the CNN's kernels. We approximate
our model's intractable posterior with Bernoulli variational distributions,
requiring no additional model parameters.
On the theoretical side, we cast dropout network training as approximate
inference in Bayesian neural networks. This allows us to implement our model
using existing tools in deep learning with no increase in time complexity,
while highlighting a negative result in the field. We show a considerable
improvement in classification accuracy compared to standard techniques and
improve on published state-of-the-art results for CIFAR-10.Comment: 12 pages, 3 figures, ICLR format, updated with reviewer comment
Scalable Training of Inference Networks for Gaussian-Process Models
Inference in Gaussian process (GP) models is computationally challenging for
large data, and often difficult to approximate with a small number of inducing
points. We explore an alternative approximation that employs stochastic
inference networks for a flexible inference. Unfortunately, for such networks,
minibatch training is difficult to be able to learn meaningful correlations
over function outputs for a large dataset. We propose an algorithm that enables
such training by tracking a stochastic, functional mirror-descent algorithm. At
each iteration, this only requires considering a finite number of input
locations, resulting in a scalable and easy-to-implement algorithm. Empirical
results show comparable and, sometimes, superior performance to existing sparse
variational GP methods.Comment: ICML 2019. Update results added in the camera-ready versio
Infinitely deep neural networks as diffusion processes
When the parameters are independently and identically distributed
(initialized) neural networks exhibit undesirable properties that emerge as the
number of layers increases, e.g. a vanishing dependency on the input and a
concentration on restrictive families of functions including constant
functions. We consider parameter distributions that shrink as the number of
layers increases in order to recover well-behaved stochastic processes in the
limit of infinite depth. This leads to set forth a link between infinitely deep
residual networks and solutions to stochastic differential equations, i.e.
diffusion processes. We show that these limiting processes do not suffer from
the aforementioned issues and investigate their properties.Comment: 16 pages, 9 figure
Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data
Surrogate modeling and uncertainty quantification tasks for PDE systems are
most often considered as supervised learning problems where input and output
data pairs are used for training. The construction of such emulators is by
definition a small data problem which poses challenges to deep learning
approaches that have been developed to operate in the big data regime. Even in
cases where such models have been shown to have good predictive capability in
high dimensions, they fail to address constraints in the data implied by the
PDE model. This paper provides a methodology that incorporates the governing
equations of the physical model in the loss/likelihood functions. The resulting
physics-constrained, deep learning models are trained without any labeled data
(e.g. employing only input data) and provide comparable predictive responses
with data-driven models while obeying the constraints of the problem at hand.
This work employs a convolutional encoder-decoder neural network approach as
well as a conditional flow-based generative model for the solution of PDEs,
surrogate model construction, and uncertainty quantification tasks. The
methodology is posed as a minimization problem of the reverse Kullback-Leibler
(KL) divergence between the model predictive density and the reference
conditional density, where the later is defined as the Boltzmann-Gibbs
distribution at a given inverse temperature with the underlying potential
relating to the PDE system of interest. The generalization capability of these
models to out-of-distribution input is considered. Quantification and
interpretation of the predictive uncertainty is provided for a number of
problems.Comment: 51 pages, 18 figures, submitted to Journal of Computational Physic
Machine learning in acoustics: theory and applications
Acoustic data provide scientific and engineering insights in fields ranging
from biology and communications to ocean and Earth science. We survey the
recent advances and transformative potential of machine learning (ML),
including deep learning, in the field of acoustics. ML is a broad family of
techniques, which are often based in statistics, for automatically detecting
and utilizing patterns in data. Relative to conventional acoustics and signal
processing, ML is data-driven. Given sufficient training data, ML can discover
complex relationships between features and desired labels or actions, or
between features themselves. With large volumes of training data, ML can
discover models describing complex acoustic phenomena such as human speech and
reverberation. ML in acoustics is rapidly developing with compelling results
and significant future promise. We first introduce ML, then highlight ML
developments in four acoustics research areas: source localization in speech
processing, source localization in ocean acoustics, bioacoustics, and
environmental sounds in everyday scenes.Comment: Published with free access in Journal of the Acoustical Society of
America, 27 Nov. 201
- …