162 research outputs found
A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks
Network complexity and computational efficiency have become increasingly
significant aspects of deep learning. Sparse deep learning addresses these
challenges by recovering a sparse representation of the underlying target
function by reducing heavily over-parameterized deep neural networks.
Specifically, deep neural architectures compressed via structured sparsity
(e.g. node sparsity) provide low latency inference, higher data throughput, and
reduced energy consumption. In this paper, we explore two well-established
shrinkage techniques, Lasso and Horseshoe, for model compression in Bayesian
neural networks. To this end, we propose structurally sparse Bayesian neural
networks which systematically prune excessive nodes with (i) Spike-and-Slab
Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors,
and develop computationally tractable variational inference including
continuous relaxation of Bernoulli variables. We establish the contraction
rates of the variational posterior of our proposed models as a function of the
network topology, layer-wise node cardinalities, and bounds on the network
weights. We empirically demonstrate the competitive performance of our models
compared to the baseline models in prediction accuracy, model compression, and
inference latency
Copula-like Variational Inference
This paper considers a new family of variational distributions motivated by
Sklar's theorem. This family is based on new copula-like densities on the
hypercube with non-uniform marginals which can be sampled efficiently, i.e.
with a complexity linear in the dimension of state space. Then, the proposed
variational densities that we suggest can be seen as arising from these
copula-like densities used as base distributions on the hypercube with Gaussian
quantile functions and sparse rotation matrices as normalizing flows. The
latter correspond to a rotation of the marginals with complexity . We provide some empirical evidence that such a variational family can
also approximate non-Gaussian posteriors and can be beneficial compared to
Gaussian approximations. Our method performs largely comparably to
state-of-the-art variational approximations on standard regression and
classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
Bayesian Compression for Deep Learning
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by adopting a Bayesian point of
view, where through sparsity inducing priors we prune large parts of the
network. We introduce two novelties in this paper: 1) we use hierarchical
priors to prune nodes instead of individual weights, and 2) we use the
posterior uncertainties to determine the optimal fixed point precision to
encode the weights. Both factors significantly contribute to achieving the
state of the art in terms of compression rates, while still staying competitive
with methods designed to optimize for speed or energy efficiency.Comment: Published as a conference paper at NIPS 201
- …