2,583 research outputs found
Exact block-wise optimization in group lasso and sparse group lasso for linear regression
The group lasso is a penalized regression method, used in regression problems
where the covariates are partitioned into groups to promote sparsity at the
group level. Existing methods for finding the group lasso estimator either use
gradient projection methods to update the entire coefficient vector
simultaneously at each step, or update one group of coefficients at a time
using an inexact line search to approximate the optimal value for the group of
coefficients when all other groups' coefficients are fixed. We present a new
method of computation for the group lasso in the linear regression case, the
Single Line Search (SLS) algorithm, which operates by computing the exact
optimal value for each group (when all other coefficients are fixed) with one
univariate line search. We perform simulations demonstrating that the SLS
algorithm is often more efficient than existing computational methods. We also
extend the SLS algorithm to the sparse group lasso problem via the Signed
Single Line Search (SSLS) algorithm, and give theoretical results to support
both algorithms.Comment: We have been made aware of the earlier work by Puig et al. (2009)
which derives the same result for the (non-sparse) group lasso setting. We
leave this manuscript available as a technical report, to serve as a
reference for the previously untreated sparse group lasso case, and for
timing comparisons of various methods in the group lasso setting. The
manuscript is updated to include this referenc
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Parallel Selective Algorithms for Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable (possibly nonconvex) function and a (block) separable
nonsmooth, convex one. The latter term is usually employed to enforce structure
in the solution, typically sparsity. Our framework is very flexible and
includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e.,
sequential) ones, as well as virtually all possibilities "in between" with only
a subset of variables updated at each iteration. Our theoretical convergence
results improve on existing ones, and numerical results on LASSO, logistic
regression, and some nonconvex quadratic problems show that the new method
consistently outperforms existing algorithms.Comment: This work is an extended version of the conference paper that has
been presented at IEEE ICASSP'14. The first and the second author contributed
equally to the paper. This revised version contains new numerical results on
non convex quadratic problem
Flexible Parallel Algorithms for Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable function and a (block) separable nonsmooth, convex one. The
latter term is typically used to enforce structure in the solution as, for
example, in Lasso problems. Our framework is very flexible and includes both
fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well
as virtually all possibilities in between (e.g., gradient- or Newton-type
methods) with only a subset of variables updated at each iteration. Our
theoretical convergence results improve on existing ones, and numerical results
show that the new method compares favorably to existing algorithms.Comment: submitted to IEEE ICASSP 201
Strong rules for nonconvex penalties and their implications for efficient algorithms in high-dimensional regression
We consider approaches for improving the efficiency of algorithms for fitting
nonconvex penalized regression models such as SCAD and MCP in high dimensions.
In particular, we develop rules for discarding variables during cyclic
coordinate descent. This dimension reduction leads to a substantial improvement
in the speed of these algorithms for high-dimensional problems. The rules we
propose here eliminate a substantial fraction of the variables from the
coordinate descent algorithm. Violations are quite rare, especially in the
locally convex region of the solution path, and furthermore, may be easily
detected and corrected by checking the Karush-Kuhn-Tucker conditions. We extend
these rules to generalized linear models, as well as to other nonconvex
penalties such as the -stabilized Mnet penalty, group MCP, and group
SCAD. We explore three variants of the coordinate decent algorithm that
incorporate these rules and study the efficiency of these algorithms in fitting
models to both simulated data and on real data from a genome-wide association
study
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors
Penalized regression is an attractive framework for variable selection
problems. Often, variables possess a grouping structure, and the relevant
selection problem is that of selecting groups, not individual variables. The
group lasso has been proposed as a way of extending the ideas of the lasso to
the problem of group selection. Nonconvex penalties such as SCAD and MCP have
been proposed and shown to have several advantages over the lasso; these
penalties may also be extended to the group selection problem, giving rise to
group SCAD and group MCP methods. Here, we describe algorithms for fitting
these models stably and efficiently. In addition, we present simulation results
and real data examples comparing and contrasting the statistical properties of
these methods
Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly
nonseparable), convex one. The latter term is usually employed to enforce
structure in the solution, typically sparsity. The main contribution of this
work is a novel \emph{parallel, hybrid random/deterministic} decomposition
scheme wherein, at each iteration, a subset of (block) variables is updated at
the same time by minimizing local convex approximations of the original
nonconvex function. To tackle with huge-scale problems, the (block) variables
to be updated are chosen according to a \emph{mixed random and deterministic}
procedure, which captures the advantages of both pure deterministic and random
update-based schemes. Almost sure convergence of the proposed scheme is
established. Numerical results show that on huge-scale problems the proposed
hybrid random/deterministic algorithm outperforms both random and deterministic
schemes.Comment: The order of the authors is alphabetica
- …