2,453 research outputs found

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted 2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Elastic net prefiltering for two class classification

    No full text
    A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model’s generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems

    An optimal subgradient algorithm for large-scale convex optimization in simple domains

    Full text link
    This paper shows that the optimal subgradient algorithm, OSGA, proposed in \cite{NeuO} can be used for solving structured large-scale convex constrained optimization problems. Only first-order information is required, and the optimal complexity bounds for both smooth and nonsmooth problems are attained. More specifically, we consider two classes of problems: (i) a convex objective with a simple closed convex domain, where the orthogonal projection on this feasible domain is efficiently available; (ii) a convex objective with a simple convex functional constraint. If we equip OSGA with an appropriate prox-function, the OSGA subproblem can be solved either in a closed form or by a simple iterative scheme, which is especially important for large-scale problems. We report numerical results for some applications to show the efficiency of the proposed scheme. A software package implementing OSGA for above domains is available

    Parameter Selection and Pre-Conditioning for a Graph Form Solver

    Full text link
    In a recent paper, Parikh and Boyd describe a method for solving a convex optimization problem, where each iteration involves evaluating a proximal operator and projection onto a subspace. In this paper we address the critical practical issues of how to select the proximal parameter in each iteration, and how to scale the original problem variables, so as the achieve reliable practical performance. The resulting method has been implemented as an open-source software package called POGS (Proximal Graph Solver), that targets multi-core and GPU-based systems, and has been tested on a wide variety of practical problems. Numerical results show that POGS can solve very large problems (with, say, more than a billion coefficients in the data), to modest accuracy in a few tens of seconds. As just one example, a radiation treatment planning problem with around 100 million coefficients in the data can be solved in a few seconds, as compared to around one hour with an interior-point method.Comment: 28 pages, 1 figure, 1 open source implementatio
    corecore