4,878 research outputs found

    Pathwise coordinate optimization

    Full text link
    We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1L_1-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the ``fused lasso,'' however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS131 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Best Subset Selection via a Modern Optimization Lens

    Get PDF
    In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing kk out of pp features in linear regression given nn observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with nn in the 1000s and pp in the 100s in minutes to provable optimality, and finds near optimal solutions for nn in the 100s and pp in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.Comment: This is a revised version (May, 2015) of the first submission in June 201

    CoCoA: A General Framework for Communication-Efficient Distributed Optimization

    Get PDF
    The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

    The Influence Function of Penalized Regression Estimators

    Full text link
    To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore it can be used to compute the asymptotic variance and the mean squared error. In this paper we compute the influence function, the asymptotic variance and the mean squared error for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations nonstandard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.Comment: appears in Statistics: A Journal of Theoretical and Applied Statistics, 201

    Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection

    Full text link
    A number of variable selection methods have been proposed involving nonconvex penalty functions. These methods, which include the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP), have been demonstrated to have attractive theoretical properties, but model fitting is not a straightforward task, and the resulting solutions may be unstable. Here, we demonstrate the potential of coordinate descent algorithms for fitting these models, establishing theoretical convergence properties and demonstrating that they are significantly faster than competing approaches. In addition, we demonstrate the utility of convexity diagnostics to determine regions of the parameter space in which the objective function is locally convex, even though the penalty is not. Our simulation study and data examples indicate that nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso in many applications. In particular, our numerical results suggest that MCP is the preferred approach among the three methods.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS388 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

    Full text link
    Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods

    A General Family of Penalties for Combining Differing Types of Penalties in Generalized Structured Models

    Get PDF
    Penalized estimation has become an established tool for regularization and model selection in regression models. A variety of penalties with specific features are available and effective algorithms for specific penalties have been proposed. But not much is available to fit models that call for a combination of different penalties. When modeling rent data, which will be considered as an example, various types of predictors call for a combination of a Ridge, a grouped Lasso and a Lasso-type penalty within one model. Algorithms that can deal with such problems, are in demand. We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models. The penalty is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty (SCAD), the elastic net and many more penalties are embedded. The approximation allows to combine all these penalties within one model. The computation is based on conventional penalized iteratively re-weighted least squares (PIRLS) algorithms and hence, easy to implement. Moreover, new penalties can be incorporated quickly. The approach is also extended to penalties with vector based arguments; that is, to penalties with norms of linear transformations of the coefficient vector. Some illustrative examples and the model for the Munich rent data show promising results
    • …
    corecore