35 research outputs found
Bayesian Compression for Deep Learning
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by adopting a Bayesian point of
view, where through sparsity inducing priors we prune large parts of the
network. We introduce two novelties in this paper: 1) we use hierarchical
priors to prune nodes instead of individual weights, and 2) we use the
posterior uncertainties to determine the optimal fixed point precision to
encode the weights. Both factors significantly contribute to achieving the
state of the art in terms of compression rates, while still staying competitive
with methods designed to optimize for speed or energy efficiency.Comment: Published as a conference paper at NIPS 201
Dual-Space Analysis of the Sparse Linear Model
Sparse linear (or generalized linear) models combine a standard likelihood
function with a sparse prior on the unknown coefficients. These priors can
conveniently be expressed as a maximization over zero-mean Gaussians with
different variance hyperparameters. Standard MAP estimation (Type I) involves
maximizing over both the hyperparameters and coefficients, while an empirical
Bayesian alternative (Type II) first marginalizes the coefficients and then
maximizes over the hyperparameters, leading to a tractable posterior
approximation. The underlying cost functions can be related via a dual-space
framework from Wipf et al. (2011), which allows both the Type I or Type II
objectives to be expressed in either coefficient or hyperparmeter space. This
perspective is useful because some analyses or extensions are more conducive to
development in one space or the other. Herein we consider the estimation of a
trade-off parameter balancing sparsity and data fit. As this parameter is
effectively a variance, natural estimators exist by assessing the problem in
hyperparameter (variance) space, transitioning natural ideas from Type II to
solve what is much less intuitive for Type I. In contrast, for analyses of
update rules and sparsity properties of local and global solutions, as well as
extensions to more general likelihood models, we can leverage coefficient-space
techniques developed for Type I and apply them to Type II. For example, this
allows us to prove that Type II-inspired techniques can be successful
recovering sparse coefficients when unfavorable restricted isometry properties
(RIP) lead to failure of popular L1 reconstructions. It also facilitates the
analysis of Type II when non-Gaussian likelihood models lead to intractable
integrations.Comment: 9 pages, 2 figures, submission to NIPS 201
A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations
<p>Abstract</p> <p>Background</p> <p>With the advent of high throughput biotechnology data acquisition platforms such as micro arrays, SNP chips and mass spectrometers, data sets with many more variables than observations are now routinely being collected. Finding relationships between response variables of interest and variables in such data sets is an important problem akin to finding needles in a haystack. Whilst methods for a number of response types have been developed a general approach has been lacking.</p> <p>Results</p> <p>The major contribution of this paper is to present a unified methodology which allows many common (statistical) response models to be fitted to such data sets. The class of models includes virtually any model with a linear predictor in it, for example (but not limited to), multiclass logistic regression (classification), generalised linear models (regression) and survival models. A fast algorithm for finding sparse well fitting models is presented. The ideas are illustrated on real data sets with numbers of variables ranging from thousands to millions. R code implementing the ideas is available for download.</p> <p>Conclusion</p> <p>The method described in this paper enables existing work on response models when there are less variables than observations to be leveraged to the situation when there are many more variables than observations. It is a powerful approach to finding parsimonious models for such datasets. The method is capable of handling problems with millions of variables and a large variety of response types within the one framework. The method compares favourably to existing methods such as support vector machines and random forests, but has the advantage of not requiring separate variable selection steps. It is also works for data types which these methods were not designed to handle. The method usually produces very sparse models which make biological interpretation simpler and more focused.</p
Binary Linear Classification and Feature Selection via Generalized Approximate Message Passing
For the problem of binary linear classification and feature selection, we
propose algorithmic approaches to classifier design based on the generalized
approximate message passing (GAMP) algorithm, recently proposed in the context
of compressive sensing. We are particularly motivated by problems where the
number of features greatly exceeds the number of training examples, but where
only a few features suffice for accurate classification. We show that
sum-product GAMP can be used to (approximately) minimize the classification
error rate and max-sum GAMP can be used to minimize a wide variety of
regularized loss functions. Furthermore, we describe an
expectation-maximization (EM)-based scheme to learn the associated model
parameters online, as an alternative to cross-validation, and we show that
GAMP's state-evolution framework can be used to accurately predict the
misclassification rate. Finally, we present a detailed numerical study to
confirm the accuracy, speed, and flexibility afforded by our GAMP-based
approaches to binary linear classification and feature selection
Using prototypes to improve convolutional networks interpretability
International audienceWe propose a method that allows the interpretation of the data representation obtained by CNN, through introducing prototypes in the feature space, that are later classified into a certain category. This way we can see how the feature space is structured in link with the categories and the related task