2,583 research outputs found

    Exact block-wise optimization in group lasso and sparse group lasso for linear regression

    Full text link
    The group lasso is a penalized regression method, used in regression problems where the covariates are partitioned into groups to promote sparsity at the group level. Existing methods for finding the group lasso estimator either use gradient projection methods to update the entire coefficient vector simultaneously at each step, or update one group of coefficients at a time using an inexact line search to approximate the optimal value for the group of coefficients when all other groups' coefficients are fixed. We present a new method of computation for the group lasso in the linear regression case, the Single Line Search (SLS) algorithm, which operates by computing the exact optimal value for each group (when all other coefficients are fixed) with one univariate line search. We perform simulations demonstrating that the SLS algorithm is often more efficient than existing computational methods. We also extend the SLS algorithm to the sparse group lasso problem via the Signed Single Line Search (SSLS) algorithm, and give theoretical results to support both algorithms.Comment: We have been made aware of the earlier work by Puig et al. (2009) which derives the same result for the (non-sparse) group lasso setting. We leave this manuscript available as a technical report, to serve as a reference for the previously untreated sparse group lasso case, and for timing comparisons of various methods in the group lasso setting. The manuscript is updated to include this referenc

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Parallel Selective Algorithms for Big Data Optimization

    Full text link
    We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e., sequential) ones, as well as virtually all possibilities "in between" with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results on LASSO, logistic regression, and some nonconvex quadratic problems show that the new method consistently outperforms existing algorithms.Comment: This work is an extended version of the conference paper that has been presented at IEEE ICASSP'14. The first and the second author contributed equally to the paper. This revised version contains new numerical results on non convex quadratic problem

    Flexible Parallel Algorithms for Big Data Optimization

    Full text link
    We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in Lasso problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually all possibilities in between (e.g., gradient- or Newton-type methods) with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results show that the new method compares favorably to existing algorithms.Comment: submitted to IEEE ICASSP 201

    Strong rules for nonconvex penalties and their implications for efficient algorithms in high-dimensional regression

    Full text link
    We consider approaches for improving the efficiency of algorithms for fitting nonconvex penalized regression models such as SCAD and MCP in high dimensions. In particular, we develop rules for discarding variables during cyclic coordinate descent. This dimension reduction leads to a substantial improvement in the speed of these algorithms for high-dimensional problems. The rules we propose here eliminate a substantial fraction of the variables from the coordinate descent algorithm. Violations are quite rare, especially in the locally convex region of the solution path, and furthermore, may be easily detected and corrected by checking the Karush-Kuhn-Tucker conditions. We extend these rules to generalized linear models, as well as to other nonconvex penalties such as the â„“2\ell_2-stabilized Mnet penalty, group MCP, and group SCAD. We explore three variants of the coordinate decent algorithm that incorporate these rules and study the efficiency of these algorithms in fitting models to both simulated data and on real data from a genome-wide association study

    Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

    Full text link
    Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods

    Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization

    Full text link
    We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing local convex approximations of the original nonconvex function. To tackle with huge-scale problems, the (block) variables to be updated are chosen according to a \emph{mixed random and deterministic} procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on huge-scale problems the proposed hybrid random/deterministic algorithm outperforms both random and deterministic schemes.Comment: The order of the authors is alphabetica
    • …
    corecore