2,453 research outputs found
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Elastic net prefiltering for two class classification
A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model’s generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems
An optimal subgradient algorithm for large-scale convex optimization in simple domains
This paper shows that the optimal subgradient algorithm, OSGA, proposed in
\cite{NeuO} can be used for solving structured large-scale convex constrained
optimization problems. Only first-order information is required, and the
optimal complexity bounds for both smooth and nonsmooth problems are attained.
More specifically, we consider two classes of problems: (i) a convex objective
with a simple closed convex domain, where the orthogonal projection on this
feasible domain is efficiently available; (ii) a convex objective with a simple
convex functional constraint. If we equip OSGA with an appropriate
prox-function, the OSGA subproblem can be solved either in a closed form or by
a simple iterative scheme, which is especially important for large-scale
problems. We report numerical results for some applications to show the
efficiency of the proposed scheme. A software package implementing OSGA for
above domains is available
Parameter Selection and Pre-Conditioning for a Graph Form Solver
In a recent paper, Parikh and Boyd describe a method for solving a convex
optimization problem, where each iteration involves evaluating a proximal
operator and projection onto a subspace. In this paper we address the critical
practical issues of how to select the proximal parameter in each iteration, and
how to scale the original problem variables, so as the achieve reliable
practical performance. The resulting method has been implemented as an
open-source software package called POGS (Proximal Graph Solver), that targets
multi-core and GPU-based systems, and has been tested on a wide variety of
practical problems. Numerical results show that POGS can solve very large
problems (with, say, more than a billion coefficients in the data), to modest
accuracy in a few tens of seconds. As just one example, a radiation treatment
planning problem with around 100 million coefficients in the data can be solved
in a few seconds, as compared to around one hour with an interior-point method.Comment: 28 pages, 1 figure, 1 open source implementatio
- …