178 research outputs found
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
Approximate dynamic programming has been used successfully in a large variety
of domains, but it relies on a small set of provided approximation features to
calculate solutions reliably. Large and rich sets of features can cause
existing algorithms to overfit because of a limited number of samples. We
address this shortcoming using regularization in approximate linear
programming. Because the proposed method can automatically select the
appropriate richness of features, its performance does not degrade with an
increasing number of features. These results rely on new and stronger sampling
bounds for regularized approximate linear programs. We also propose a
computationally efficient homotopy method. The empirical evaluation of the
approach shows that the proposed method performs well on simple MDPs and
standard benchmark problems.Comment: Technical report corresponding to the ICML2010 submission of the same
nam
Recursive Compressed Sensing
We introduce a recursive algorithm for performing compressed sensing on
streaming data. The approach consists of a) recursive encoding, where we sample
the input stream via overlapping windowing and make use of the previous
measurement in obtaining the next one, and b) recursive decoding, where the
signal estimate from the previous window is utilized in order to achieve faster
convergence in an iterative optimization scheme applied to decode the new one.
To remove estimation bias, a two-step estimation procedure is proposed
comprising support set detection and signal amplitude estimation. Estimation
accuracy is enhanced by a non-linear voting method and averaging estimates over
multiple windows. We analyze the computational complexity and estimation error,
and show that the normalized error variance asymptotically goes to zero for
sublinear sparsity. Our simulation results show speed up of an order of
magnitude over traditional CS, while obtaining significantly lower
reconstruction error under mild conditions on the signal magnitudes and the
noise level.Comment: Submitted to IEEE Transactions on Information Theor
LEARNING AND ESTIMATION APPLICATIONS OF AN ONLINE HOMOTOPY ALGORITHM FOR A GENERALIZATION OF THE LASSO
Abstract. The LASSO is a widely used shrinkage and selection method for linear regression. We propose a generalization of the LASSO in which the l 1 penalty is applied on a linear transformation of the regression parameters, allowing to input prior information on the structure of the problem and to improve interpretability of the results. We also study time varying system with an l 1 -penalty on the variations of the state, leading to estimates that exhibit few "jumps". We propose a homotopy algorithm that updates the solution as additional measurements are available. The algorithm takes advantage of the sparsity of the solution for computational efficiency and is promising for mining large datasets. The algorithm is implemented on three experimental data sets representing applications to traffic estimation from sparsely sampled probe vehicles, flow estimation in tidal channels and text analysis of on-line news. Least-squares regression with l 1 -norm regularization is known as the LASSO algorith
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
The composite absolute penalties family for grouped and hierarchical variable selection
Extracting useful information from high-dimensional data is an important
focus of today's statistical research and practice. Penalized loss function
minimization has been shown to be effective for this task both theoretically
and empirically. With the virtues of both regularization and sparsity, the
-penalized squared error minimization method Lasso has been popular in
regression models and beyond. In this paper, we combine different norms
including to form an intelligent penalty in order to add side information
to the fitting of a regression or classification model to obtain reasonable
estimates. Specifically, we introduce the Composite Absolute Penalties (CAP)
family, which allows given grouping and hierarchical relationships between the
predictors to be expressed. CAP penalties are built by defining groups and
combining the properties of norm penalties at the across-group and within-group
levels. Grouped selection occurs for nonoverlapping groups. Hierarchical
variable selection is reached by defining groups with particular overlapping
patterns. We propose using the BLASSO and cross-validation to compute CAP
estimates in general. For a subfamily of CAP estimates involving only the
and norms, we introduce the iCAP algorithm to trace the entire
regularization path for the grouped selection problem. Within this subfamily,
unbiased estimates of the degrees of freedom (df) are derived so that the
regularization parameter is selected without cross-validation. CAP is shown to
improve on the predictive performance of the LASSO in a series of simulated
experiments, including cases with and possibly mis-specified
groupings. When the complexity of a model is properly calculated, iCAP is seen
to be parsimonious in the experiments.Comment: Published in at http://dx.doi.org/10.1214/07-AOS584 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …