Search CORE

949 research outputs found

Scalable Data Augmentation for Deep Learning

Author: Polson Nicholas G.
Sokolov Vadim O.
Wang Yuexi
Publication venue
Publication date: 22/03/2019
Field of study

Scalable Data Augmentation (SDA) provides a framework for training deep learning models using auxiliary hidden layers. Scalable MCMC is available for network training and inference. SDA provides a number of computational advantages over traditional algorithms, such as avoiding backtracking, local modes and can perform optimization with stochastic gradient descent (SGD) in TensorFlow. Standard deep neural networks with logit, ReLU and SVM activation functions are straightforward to implement. To illustrate our architectures and methodology, we use P\'{o}lya-Gamma logit data augmentation for a number of standard datasets. Finally, we conclude with directions for future research

arXiv.org e-Print Archive

Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Author: Bennamoun Mohammed
Boussaid Farid
Buntine Wray
Jospin Laurent Valentin
Laga Hamid
Publication venue
Publication date: 30/09/2021
Field of study

Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial provides an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian Neural Networks, i.e. Stochastic Artificial Neural Networks trained using Bayesian methods.Comment: 35 pages, 15 figure

arXiv.org e-Print Archive

MCMC to address model misspecification in Deep Learning classification of Radio Galaxies

Author: Mohan Devina
Scaife Anna
Publication venue
Publication date: 14/11/2023
Field of study

The radio astronomy community is adopting deep learning techniques to deal with the huge data volumes expected from the next-generation of radio observatories. Bayesian neural networks (BNNs) provide a principled way to model uncertainty in the predictions made by deep learning models and will play an important role in extracting well-calibrated uncertainty estimates from the outputs of these models. However, most commonly used approximate Bayesian inference techniques such as variational inference and MCMC-based algorithms experience a "cold posterior effect (CPE)", according to which the posterior must be down-weighted in order to get good predictive performance. The CPE has been linked to several factors such as data augmentation or dataset curation leading to a misspecified likelihood and prior misspecification. In this work we use MCMC sampling to show that a Gaussian parametric family is a poor variational approximation to the true posterior and gives rise to the CPE previously observed in morphological classification of radio galaxies using variational inference based BNNs.Comment: Accepted in Machine Learning and the Physical Sciences Workshop at NeurIPS 2023; 6 pages, 1 figure, 1 tabl

arXiv.org e-Print Archive

Subsampling MCMC - An introduction for the survey statistician

Author: Dang Khue-Dung
Kohn Robert
Quiroz Matias
Tran Minh-Ngoc
Villani Mattias
Publication venue
Publication date: 20/09/2018
Field of study

The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called pseudo-marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version contained a bug in generating the figures and reference

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

BayesDLL: Bayesian Deep Learning Library

Author: Hospedales Timothy
Kim Minyoung
Publication venue
Publication date: 22/09/2023
Field of study

We release a new Bayesian neural network library for PyTorch for large-scale deep networks. Our library implements mainstream approximate Bayesian inference algorithms: variational inference, MC-dropout, stochastic-gradient MCMC, and Laplace approximation. The main differences from other existing Bayesian neural network libraries are as follows: 1) Our library can deal with very large-scale deep networks including Vision Transformers (ViTs). 2) We need virtually zero code modifications for users (e.g., the backbone network definition codes do not neet to be modified at all). 3) Our library also allows the pre-trained model weights to serve as a prior mean, which is very useful for performing Bayesian inference with the large-scale foundation models like ViTs that are hard to optimise from scratch with the downstream data alone. Our code is publicly available at: \url{https://github.com/SamsungLabs/BayesDLL}\footnote{A mirror repository is also available at: \url{https://github.com/minyoungkim21/BayesDLL}.}

arXiv.org e-Print Archive

Stochastic partial differential equation based modelling of large space-time data sets

Author: Abramowitz
Anderson
Antolik
Applequist
Banerjee
Banerjee
Bardossy
Bell
Berrocal
Borgman
Bremnes
Bronson
Brooks
Brown
Cameletti
Carter
Coe
Cooley
Cramér
Cressie
Cressie
Cressie
Duan
Dudgeon
Folland
Friederichs
Frühwirth-Schnatter
Fuentes
Furrer
Gelfand
Gelfand
Gelman
Gilks
Gneiting
Gneiting
Gneiting
Gneiting
Gneiting
Golightly
Gottlieb
Haberman
Hamill
Hamill
Handcock
Hastings
Heine
Huang
Hutchinson
Johannesson
Jones
Kleiber
Künsch
Lindgren
Ma
Malmberg
Matheson
Metropolis
Neal
Nychka
Paciorek
Paciorek
Palmer
Pedlosky
Ramrez
Robert
Roberts
Roberts
Roberts
Royle
Rue
Rue
Sampson
Sansó
Sansó
Shumway
Sigrist
Simpson
Sloughter
Smith
Solna
Stein
Stein
Stein
Stein
Stensrud
Steppeler
Stidd
Storvik
Stroud
Tobin
Vecchia
Vivar
Whittle
Whittle
Whittle
Wikle
Wikle
Wikle
Wikle
Wikle
Wikle
Wikle
Wilks
Wilks
Xu
Xu
Yussouf
Zheng
Publication venue: 'Wiley'
Publication date: 11/02/2016
Field of study

Increasingly larger data sets of processes in space and time ask for statistical models and methods that can cope with such data. We show that the solution of a stochastic advection-diffusion partial differential equation provides a flexible model class for spatio-temporal processes which is computationally feasible also for large data sets. The Gaussian process defined through the stochastic partial differential equation has in general a nonseparable covariance structure. Furthermore, its parameters can be physically interpreted as explicitly modeling phenomena such as transport and diffusion that occur in many natural processes in diverse fields ranging from environmental sciences to ecology. In order to obtain computationally efficient statistical algorithms we use spectral methods to solve the stochastic partial differential equation. This has the advantage that approximation errors do not accumulate over time, and that in the spectral space the computational cost grows linearly with the dimension, the total computational costs of Bayesian or frequentist inference being dominated by the fast Fourier transform. The proposed model is applied to postprocessing of precipitation forecasts from a numerical weather prediction model for northern Switzerland. In contrast to the raw forecasts from the numerical model, the postprocessed forecasts are calibrated and quantify prediction uncertainty. Moreover, they outperform the raw forecasts, in the sense that they have a lower mean absolute error

arXiv.org e-Print Archive

Crossref

Quasar Black Hole Mass Estimates in the Era of Time Domain Astronomy

Author: Kelly Brandon C.
Malkan Matthew
Pancoast Anna
Treu Tommaso
Woo Jong-Hak
Publication venue: 'IOP Publishing'
Publication date: 19/07/2013
Field of study

We investigate the dependence of the normalization of the high-frequency part of the X-ray and optical power spectral densities (PSD) on black hole mass for a sample of 39 active galactic nuclei (AGN) with black hole masses estimated from reverberation mapping or dynamical modeling. We obtained new Swift observations of PG 1426+015, which has the largest estimated black hole mass of the AGN in our sample. We develop a novel statistical method to estimate the PSD from a lightcurve of photon counts with arbitrary sampling, eliminating the need to bin a lightcurve to achieve Gaussian statistics, and we use this technique to estimate the X-ray variability parameters for the faint AGN in our sample. We find that the normalization of the high-frequency X-ray PSD is inversely proportional to black hole mass. We discuss how to use this scaling relationship to obtain black hole mass estimates from the short time-scale X-ray variability amplitude with precision ~ 0.38 dex. The amplitude of optical variability on time scales of days is also anti-correlated with black hole mass, but with larger scatter. Instead, the optical variability amplitude exhibits the strongest anti-correlation with luminosity. We conclude with a discussion of the implications of our results for estimating black hole mass from the amplitude of AGN variability.Comment: 19 pages, 10 figures, emulateapj format, submitted to Ap

arXiv.org e-Print Archive

Crossref

eScholarship - University of California