468 research outputs found

    Piecewise linear regularized solution paths

    Full text link
    We consider the generic regularized optimization problem β^(λ)=argminβL(y,Xβ)+λJ(β)\hat{\mathsf{\beta}}(\lambda)=\arg \min_{\beta}L({\sf{y}},X{\sf{\beta}})+\lambda J({\sf{\beta}}). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for the LASSO--that is, if LL is squared error loss and J(β)=β1J(\beta)=\|\beta\|_1 is the 1\ell_1 norm of β\beta--the optimal coefficient path is piecewise linear, that is, β^(λ)/λ\partial \hat{\beta}(\lambda)/\partial \lambda is piecewise constant. We derive a general characterization of the properties of (loss LL, penalty JJ) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.Comment: Published at http://dx.doi.org/10.1214/009053606000001370 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Randomized Sketches of Convex Programs with Sharp Guarantees

    Full text link
    Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, random projection is also useful for reducing memory usage, and has useful properties for privacy-sensitive optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of constraint set. For a broad class of random projections, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected down to the statistical dimension of the tangent cone of the constraints at the original solution, which is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and 1\ell_1-constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications on privacy-sensitive optimization and some connections with de-noising and compressed sensing
    corecore