32,497 research outputs found
Convex optimization over intersection of simple sets: improved convergence rate guarantees via an exact penalty approach
We consider the problem of minimizing a convex function over the intersection
of finitely many simple sets which are easy to project onto. This is an
important problem arising in various domains such as machine learning. The main
difficulty lies in finding the projection of a point in the intersection of
many sets. Existing approaches yield an infeasible point with an
iteration-complexity of for nonsmooth problems with no
guarantees on the in-feasibility. By reformulating the problem through exact
penalty functions, we derive first-order algorithms which not only guarantees
that the distance to the intersection is small but also improve the complexity
to and for smooth functions. For
composite and smooth problems, this is achieved through a saddle-point
reformulation where the proximal operators required by the primal-dual
algorithms can be computed in closed form. We illustrate the benefits of our
approach on a graph transduction problem and on graph matching
Smoothing Proximal Gradient Method for General Structured Sparse Learning
We study the problem of learning high dimensional regression models
regularized by a structured-sparsity-inducing penalty that encodes prior
structural information on either input or output sides. We consider two widely
adopted types of such penalties as our motivating examples: 1) overlapping
group lasso penalty, based on the l1/l2 mixed-norm penalty, and 2) graph-guided
fusion penalty. For both types of penalties, due to their non-separability,
developing an efficient optimization method has remained a challenging problem.
In this paper, we propose a general optimization approach, called smoothing
proximal gradient method, which can solve the structured sparse regression
problems with a smooth convex loss and a wide spectrum of
structured-sparsity-inducing penalties. Our approach is based on a general
smoothing technique of Nesterov. It achieves a convergence rate faster than the
standard first-order method, subgradient method, and is much more scalable than
the most widely used interior-point method. Numerical results are reported to
demonstrate the efficiency and scalability of the proposed method.Comment: arXiv admin note: substantial text overlap with arXiv:1005.471
Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization
We propose a new first-order optimisation algorithm to solve high-dimensional
non-smooth composite minimisation problems. Typical examples of such problems
have an objective that decomposes into a non-smooth empirical risk part and a
non-smooth regularisation penalty. The proposed algorithm, called Semi-Proximal
Mirror-Prox, leverages the Fenchel-type representation of one part of the
objective while handling the other part of the objective via linear
minimization over the domain. The algorithm stands in contrast with more
classical proximal gradient algorithms with smoothing, which require the
computation of proximal operators at each iteration and can therefore be
impractical for high-dimensional problems. We establish the theoretical
convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal
complexity bounds, i.e. , for the number of calls to linear
minimization oracle. We present promising experimental results showing the
interest of the approach in comparison to competing methods
FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging
The total variation (TV) penalty, as many other analysis-sparsity problems,
does not lead to separable factors or a proximal operatorwith a closed-form
expression, such as soft thresholding for the penalty. As a result,
in a variational formulation of an inverse problem or statisticallearning
estimation, it leads to challenging non-smooth optimization problemsthat are
often solved with elaborate single-step first-order methods. When thedata-fit
term arises from empirical measurements, as in brain imaging, it isoften very
ill-conditioned and without simple structure. In this situation, in proximal
splitting methods, the computation cost of thegradient step can easily dominate
each iteration. Thus it is beneficialto minimize the number of gradient
steps.We present fAASTA, a variant of FISTA, that relies on an internal solver
forthe TV proximal operator, and refines its tolerance to balance
computationalcost of the gradient and the proximal steps. We give benchmarks
andillustrations on "brain decoding": recovering brain maps from
noisymeasurements to predict observed behavior. The algorithm as well as
theempirical study of convergence speed are valuable for any non-exact
proximaloperator, in particular analysis-sparsity problems
Templates for Convex Cone Problems with Applications to Sparse Signal Recovery
This paper develops a general framework for solving a variety of convex cone
problems that frequently arise in signal processing, machine learning,
statistics, and other fields. The approach works as follows: first, determine a
conic formulation of the problem; second, determine its dual; third, apply
smoothing; and fourth, solve using an optimal first-order method. A merit of
this approach is its flexibility: for example, all compressed sensing problems
can be solved via this approach. These include models with objective
functionals such as the total-variation norm, ||Wx||_1 where W is arbitrary, or
a combination thereof. In addition, the paper also introduces a number of
technical contributions such as a novel continuation scheme, a novel approach
for controlling the step size, and some new results showing that the smooth and
unsmoothed problems are sometimes formally equivalent. Combined with our
framework, these lead to novel, stable and computationally efficient
algorithms. For instance, our general implementation is competitive with
state-of-the-art methods for solving intensively studied problems such as the
LASSO. Further, numerical experiments show that one can solve the Dantzig
selector problem, for which no efficient large-scale solvers exist, in a few
hundred iterations. Finally, the paper is accompanied with a software release.
This software is not a single, monolithic solver; rather, it is a suite of
programs and routines designed to serve as building blocks for constructing
complete algorithms.Comment: The TFOCS software is available at http://tfocs.stanford.edu This
version has updated reference
- …