4,490 research outputs found
On Sparsity Inducing Regularization Methods for Machine Learning
During the past years there has been an explosion of interest in learning
methods based on sparsity regularization. In this paper, we discuss a general
class of such methods, in which the regularizer can be expressed as the
composition of a convex function with a linear function. This setting
includes several methods such the group Lasso, the Fused Lasso, multi-task
learning and many more. We present a general approach for solving
regularization problems of this kind, under the assumption that the proximity
operator of the function is available. Furthermore, we comment on the
application of this approach to support vector machines, a technique pioneered
by the groundbreaking work of Vladimir Vapnik.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1104.143
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups. To this end,
we present two different strategies: On the one hand, we show that the proximal
operator associated with a sum of l_infinity-norms can be computed exactly in
polynomial time by solving a quadratic min-cost flow problem, allowing the use
of accelerated proximal gradient methods. On the other hand, we use proximal
splitting techniques, and address an equivalent formulation with
non-overlapping groups, but in higher dimension and with additional
constraints. We propose efficient and scalable algorithms exploiting these two
strategies, which are significantly faster than alternative approaches. We
illustrate these methods with several problems such as CUR matrix
factorization, multi-task learning of tree-structured dictionaries, background
subtraction in video sequences, image denoising with wavelets, and topographic
dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR
Learning the Structure for Structured Sparsity
Structured sparsity has recently emerged in statistics, machine learning and
signal processing as a promising paradigm for learning in high-dimensional
settings. All existing methods for learning under the assumption of structured
sparsity rely on prior knowledge on how to weight (or how to penalize)
individual subsets of variables during the subset selection process, which is
not available in general. Inferring group weights from data is a key open
research problem in structured sparsity.In this paper, we propose a Bayesian
approach to the problem of group weight learning. We model the group weights as
hyperparameters of heavy-tailed priors on groups of variables and derive an
approximate inference scheme to infer these hyperparameters. We empirically
show that we are able to recover the model hyperparameters when the data are
generated from the model, and we demonstrate the utility of learning weights in
synthetic and real denoising problems
Shakeout: A New Approach to Regularized Deep Neural Network Training
Recent years have witnessed the success of deep neural networks in dealing
with a plenty of practical problems. Dropout has played an essential role in
many successful deep neural networks, by inducing regularization in the model
training. In this paper, we present a new regularized training approach:
Shakeout. Instead of randomly discarding units as Dropout does at the training
stage, Shakeout randomly chooses to enhance or reverse each unit's contribution
to the next layer. This minor modification of Dropout has the statistical
trait: the regularizer induced by Shakeout adaptively combines , and
regularization terms. Our classification experiments with representative
deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that
Shakeout deals with over-fitting effectively and outperforms Dropout. We
empirically demonstrate that Shakeout leads to sparser weights under both
unsupervised and supervised settings. Shakeout also leads to the grouping
effect of the input units in a layer. Considering the weights in reflecting the
importance of connections, Shakeout is superior to Dropout, which is valuable
for the deep model compression. Moreover, we demonstrate that Shakeout can
effectively reduce the instability of the training process of the deep
architecture.Comment: Appears at T-PAMI 201
Sparse Support Vector Infinite Push
In this paper, we address the problem of embedded feature selection for
ranking on top of the list problems. We pose this problem as a regularized
empirical risk minimization with -norm push loss function () and
sparsity inducing regularizers. We leverage the issues related to this
challenging optimization problem by considering an alternating direction method
of multipliers algorithm which is built upon proximal operators of the loss
function and the regularizer. Our main technical contribution is thus to
provide a numerical scheme for computing the infinite push loss function
proximal operator. Experimental results on toy, DNA microarray and BCI problems
show how our novel algorithm compares favorably to competitors for ranking on
top while using fewer variables in the scoring function.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
- …