949 research outputs found
Scalable Data Augmentation for Deep Learning
Scalable Data Augmentation (SDA) provides a framework for training deep
learning models using auxiliary hidden layers. Scalable MCMC is available for
network training and inference. SDA provides a number of computational
advantages over traditional algorithms, such as avoiding backtracking, local
modes and can perform optimization with stochastic gradient descent (SGD) in
TensorFlow. Standard deep neural networks with logit, ReLU and SVM activation
functions are straightforward to implement. To illustrate our architectures and
methodology, we use P\'{o}lya-Gamma logit data augmentation for a number of
standard datasets. Finally, we conclude with directions for future research
Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users
Modern deep learning methods constitute incredibly powerful tools to tackle a
myriad of challenging problems. However, since deep learning methods operate as
black boxes, the uncertainty associated with their predictions is often
challenging to quantify. Bayesian statistics offer a formalism to understand
and quantify the uncertainty associated with deep neural network predictions.
This tutorial provides an overview of the relevant literature and a complete
toolset to design, implement, train, use and evaluate Bayesian Neural Networks,
i.e. Stochastic Artificial Neural Networks trained using Bayesian methods.Comment: 35 pages, 15 figure
MCMC to address model misspecification in Deep Learning classification of Radio Galaxies
The radio astronomy community is adopting deep learning techniques to deal
with the huge data volumes expected from the next-generation of radio
observatories. Bayesian neural networks (BNNs) provide a principled way to
model uncertainty in the predictions made by deep learning models and will play
an important role in extracting well-calibrated uncertainty estimates from the
outputs of these models. However, most commonly used approximate Bayesian
inference techniques such as variational inference and MCMC-based algorithms
experience a "cold posterior effect (CPE)", according to which the posterior
must be down-weighted in order to get good predictive performance. The CPE has
been linked to several factors such as data augmentation or dataset curation
leading to a misspecified likelihood and prior misspecification. In this work
we use MCMC sampling to show that a Gaussian parametric family is a poor
variational approximation to the true posterior and gives rise to the CPE
previously observed in morphological classification of radio galaxies using
variational inference based BNNs.Comment: Accepted in Machine Learning and the Physical Sciences Workshop at
NeurIPS 2023; 6 pages, 1 figure, 1 tabl
Subsampling MCMC - An introduction for the survey statistician
The rapid development of computing power and efficient Markov Chain Monte
Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics,
making it a highly practical inference method in applied work. However, MCMC
algorithms tend to be computationally demanding, and are particularly slow for
large datasets. Data subsampling has recently been suggested as a way to make
MCMC methods scalable on massively large data, utilizing efficient sampling
schemes and estimators from the survey sampling literature. These developments
tend to be unknown by many survey statisticians who traditionally work with
non-Bayesian methods, and rarely use MCMC. Our article explains the idea of
data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a
so called pseudo-marginal MCMC approach to speeding up MCMC through data
subsampling. The review is written for a survey statistician without previous
knowledge of MCMC methods since our aim is to motivate survey sampling experts
to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version
contained a bug in generating the figures and reference
BayesDLL: Bayesian Deep Learning Library
We release a new Bayesian neural network library for PyTorch for large-scale
deep networks. Our library implements mainstream approximate Bayesian inference
algorithms: variational inference, MC-dropout, stochastic-gradient MCMC, and
Laplace approximation. The main differences from other existing Bayesian neural
network libraries are as follows: 1) Our library can deal with very large-scale
deep networks including Vision Transformers (ViTs). 2) We need virtually zero
code modifications for users (e.g., the backbone network definition codes do
not neet to be modified at all). 3) Our library also allows the pre-trained
model weights to serve as a prior mean, which is very useful for performing
Bayesian inference with the large-scale foundation models like ViTs that are
hard to optimise from scratch with the downstream data alone. Our code is
publicly available at: \url{https://github.com/SamsungLabs/BayesDLL}\footnote{A
mirror repository is also available at:
\url{https://github.com/minyoungkim21/BayesDLL}.}
Stochastic partial differential equation based modelling of large space-time data sets
Increasingly larger data sets of processes in space and time ask for
statistical models and methods that can cope with such data. We show that the
solution of a stochastic advection-diffusion partial differential equation
provides a flexible model class for spatio-temporal processes which is
computationally feasible also for large data sets. The Gaussian process defined
through the stochastic partial differential equation has in general a
nonseparable covariance structure. Furthermore, its parameters can be
physically interpreted as explicitly modeling phenomena such as transport and
diffusion that occur in many natural processes in diverse fields ranging from
environmental sciences to ecology. In order to obtain computationally efficient
statistical algorithms we use spectral methods to solve the stochastic partial
differential equation. This has the advantage that approximation errors do not
accumulate over time, and that in the spectral space the computational cost
grows linearly with the dimension, the total computational costs of Bayesian or
frequentist inference being dominated by the fast Fourier transform. The
proposed model is applied to postprocessing of precipitation forecasts from a
numerical weather prediction model for northern Switzerland. In contrast to the
raw forecasts from the numerical model, the postprocessed forecasts are
calibrated and quantify prediction uncertainty. Moreover, they outperform the
raw forecasts, in the sense that they have a lower mean absolute error
Quasar Black Hole Mass Estimates in the Era of Time Domain Astronomy
We investigate the dependence of the normalization of the high-frequency part
of the X-ray and optical power spectral densities (PSD) on black hole mass for
a sample of 39 active galactic nuclei (AGN) with black hole masses estimated
from reverberation mapping or dynamical modeling. We obtained new Swift
observations of PG 1426+015, which has the largest estimated black hole mass of
the AGN in our sample. We develop a novel statistical method to estimate the
PSD from a lightcurve of photon counts with arbitrary sampling, eliminating the
need to bin a lightcurve to achieve Gaussian statistics, and we use this
technique to estimate the X-ray variability parameters for the faint AGN in our
sample. We find that the normalization of the high-frequency X-ray PSD is
inversely proportional to black hole mass. We discuss how to use this scaling
relationship to obtain black hole mass estimates from the short time-scale
X-ray variability amplitude with precision ~ 0.38 dex. The amplitude of optical
variability on time scales of days is also anti-correlated with black hole
mass, but with larger scatter. Instead, the optical variability amplitude
exhibits the strongest anti-correlation with luminosity. We conclude with a
discussion of the implications of our results for estimating black hole mass
from the amplitude of AGN variability.Comment: 19 pages, 10 figures, emulateapj format, submitted to Ap
- …