1,120 research outputs found
Bayesian Compression for Deep Learning
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by adopting a Bayesian point of
view, where through sparsity inducing priors we prune large parts of the
network. We introduce two novelties in this paper: 1) we use hierarchical
priors to prune nodes instead of individual weights, and 2) we use the
posterior uncertainties to determine the optimal fixed point precision to
encode the weights. Both factors significantly contribute to achieving the
state of the art in terms of compression rates, while still staying competitive
with methods designed to optimize for speed or energy efficiency.Comment: Published as a conference paper at NIPS 201
Structured sparsity-inducing norms through submodular functions
Sparse methods for supervised learning aim at finding good linear predictors
from as few variables as possible, i.e., with small cardinality of their
supports. This combinatorial selection problem is often turned into a convex
optimization problem by replacing the cardinality function by its convex
envelope (tightest convex lower bound), in this case the L1-norm. In this
paper, we investigate more general set-functions than the cardinality, that may
incorporate prior knowledge or structural constraints which are common in many
applications: namely, we show that for nondecreasing submodular set-functions,
the corresponding convex envelope can be obtained from its \lova extension, a
common tool in submodular analysis. This defines a family of polyhedral norms,
for which we provide generic algorithmic tools (subgradients and proximal
operators) and theoretical results (conditions for support recovery or
high-dimensional inference). By selecting specific submodular functions, we can
give a new interpretation to known norms, such as those based on
rank-statistics or grouped norms with potentially overlapping groups; we also
define new norms, in particular ones that can be used as non-factorial priors
for supervised learning
Graph-Sparse LDA: A Topic Model with Structured Sparsity
Originally designed to model text, topic modeling has become a powerful tool
for uncovering latent structure in domains including medicine, finance, and
vision. The goals for the model vary depending on the application: in some
cases, the discovered topics may be used for prediction or some other
downstream task. In other cases, the content of the topic itself may be of
intrinsic scientific interest.
Unfortunately, even using modern sparse techniques, the discovered topics are
often difficult to interpret due to the high dimensionality of the underlying
space. To improve topic interpretability, we introduce Graph-Sparse LDA, a
hierarchical topic model that leverages knowledge of relationships between
words (e.g., as encoded by an ontology). In our model, topics are summarized by
a few latent concept-words from the underlying graph that explain the observed
words. Graph-Sparse LDA recovers sparse, interpretable summaries on two
real-world biomedical datasets while matching state-of-the-art prediction
performance
Sparsity-Promoting Bayesian Dynamic Linear Models
Sparsity-promoting priors have become increasingly popular over recent years
due to an increased number of regression and classification applications
involving a large number of predictors. In time series applications where
observations are collected over time, it is often unrealistic to assume that
the underlying sparsity pattern is fixed. We propose here an original class of
flexible Bayesian linear models for dynamic sparsity modelling. The proposed
class of models expands upon the existing Bayesian literature on sparse
regression using generalized multivariate hyperbolic distributions. The
properties of the models are explored through both analytic results and
simulation studies. We demonstrate the model on a financial application where
it is shown that it accurately represents the patterns seen in the analysis of
stock and derivative data, and is able to detect major events by filtering an
artificial portfolio of assets
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
- …