3,000 research outputs found
DC Proximal Newton for Non-Convex Optimization Problems
We introduce a novel algorithm for solving learning problems where both the
loss function and the regularizer are non-convex but belong to the class of
difference of convex (DC) functions. Our contribution is a new general purpose
proximal Newton algorithm that is able to deal with such a situation. The
algorithm consists in obtaining a descent direction from an approximation of
the loss function and then in performing a line search to ensure sufficient
descent. A theoretical analysis is provided showing that the iterates of the
proposed algorithm {admit} as limit points stationary points of the DC
objective function. Numerical experiments show that our approach is more
efficient than current state of the art for a problem with a convex loss
functions and non-convex regularizer. We have also illustrated the benefit of
our algorithm in high-dimensional transductive learning problem where both loss
function and regularizers are non-convex
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
A quasi-Newton proximal splitting method
A new result in convex analysis on the calculation of proximity operators in
certain scaled norms is derived. We describe efficient implementations of the
proximity calculation for a useful class of functions; the implementations
exploit the piece-wise linear nature of the dual problem. The second part of
the paper applies the previous result to acceleration of convex minimization
problems, and leads to an elegant quasi-Newton method. The optimization method
compares favorably against state-of-the-art alternatives. The algorithm has
extensive applications including signal processing, sparse recovery and machine
learning and classification
Smoothing Proximal Gradient Method for General Structured Sparse Learning
We study the problem of learning high dimensional regression models
regularized by a structured-sparsity-inducing penalty that encodes prior
structural information on either input or output sides. We consider two widely
adopted types of such penalties as our motivating examples: 1) overlapping
group lasso penalty, based on the l1/l2 mixed-norm penalty, and 2) graph-guided
fusion penalty. For both types of penalties, due to their non-separability,
developing an efficient optimization method has remained a challenging problem.
In this paper, we propose a general optimization approach, called smoothing
proximal gradient method, which can solve the structured sparse regression
problems with a smooth convex loss and a wide spectrum of
structured-sparsity-inducing penalties. Our approach is based on a general
smoothing technique of Nesterov. It achieves a convergence rate faster than the
standard first-order method, subgradient method, and is much more scalable than
the most widely used interior-point method. Numerical results are reported to
demonstrate the efficiency and scalability of the proposed method.Comment: arXiv admin note: substantial text overlap with arXiv:1005.471
- …