1,302 research outputs found
Error bounds, quadratic growth, and linear convergence of proximal methods
The proximal gradient algorithm for minimizing the sum of a smooth and a
nonsmooth convex function often converges linearly even without strong
convexity. One common reason is that a multiple of the step length at each
iteration may linearly bound the "error" -- the distance to the solution set.
We explain the observed linear convergence intuitively by proving the
equivalence of such an error bound to a natural quadratic growth condition. Our
approach generalizes to linear convergence analysis for proximal methods (of
Gauss-Newton type) for minimizing compositions of nonsmooth functions with
smooth mappings. We observe incidentally that short step-lengths in the
algorithm indicate near-stationarity, suggesting a reliable termination
criterion.Comment: 35 page
The restricted strong convexity revisited: Analysis of equivalence to error bound and quadratic growth
The restricted strong convexity is an effective tool for deriving globally
linear convergence rates of descent methods in convex minimization. Recently,
the global error bound and quadratic growth properties appeared as new
competitors. In this paper, with the help of Ekeland's variational principle,
we show the equivalence between these three notions. To deal with convex
minimization over a closed convex set and structured convex optimization, we
propose a group of modified versions and a group of extended versions of these
three notions by using gradient mapping and proximal gradient mapping
separately, and prove that the equivalence for the modified and extended
versions still holds. Based on these equivalence notions, we establish new
asymptotically linear convergence results for the proximal gradient method.
Finally, we revisit the problem of minimizing the composition of an affine
mapping with a strongly convex differentiable function over a polyhedral set,
and obtain a strengthened property of the restricted strong convex type under
mild assumptions.Comment: 15 pages; accepted in Optimization Lette
On the R-superlinear convergence of the KKT residues generated by the augmented Lagrangian method for convex composite conic programming
Due to the possible lack of primal-dual-type error bounds, the superlinear
convergence for the Karush-Kuhn-Tucker (KKT) residues of the sequence generated
by augmented Lagrangian method (ALM) for solving convex composite conic
programming (CCCP) has long been an outstanding open question. In this paper,
we aim to resolve this issue by first conducting convergence rate analysis for
the ALM with Rockafellar's stopping criteria under only a mild quadratic growth
condition on the dual of CCCP. More importantly, by further assuming that the
Robinson constraint qualification holds, we establish the R-superlinear
convergence of the KKT residues of the iterative sequence under
easy-to-implement stopping criteria {for} the augmented Lagrangian subproblems.
Equipped with this discovery, we gain insightful interpretations on the
impressive numerical performance of several recently developed semismooth
Newton-CG based ALM solvers for solving linear and convex quadratic
semidefinite programming
Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs
We study a family of (potentially non-convex) constrained optimization
problems with convex composite structure. Through a novel analysis of
non-smooth geometry, we show that proximal-type algorithms applied to exact
penalty formulations of such problems exhibit local linear convergence under a
quadratic growth condition, which the compositional structure we consider
ensures. The main application of our results is to low-rank semidefinite
optimization with Burer-Monteiro factorizations. We precisely identify the
conditions for quadratic growth in the factorized problem via structures in the
semidefinite problem, which could be of independent interest for understanding
matrix factorization
Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization
In this paper, we consider the problem of minimizing the average of a large
number of nonsmooth and convex functions. Such problems often arise in typical
machine learning problems as empirical risk minimization, but are
computationally very challenging. We develop and analyze a new algorithm that
achieves robust linear convergence rate, and both its time complexity and
gradient complexity are superior than state-of-art nonsmooth algorithms and
subgradient-based schemes. Besides, our algorithm works without any extra error
bound conditions on the objective function as well as the common
strongly-convex condition. We show that our algorithm has wide applications in
optimization and machine learning problems, and demonstrate experimentally that
it performs well on a large-scale ranking problem.Comment: 10 pages, 12 figures. arXiv admin note: text overlap with
arXiv:1103.4296, arXiv:1403.4699 by other author
Adaptive restart of accelerated gradient methods under local quadratic growth condition
By analyzing accelerated proximal gradient methods under a local quadratic
growth condition, we show that restarting these algorithms at any frequency
gives a globally linearly convergent algorithm. This result was previously
known only for long enough frequencies. Then, as the rate of convergence
depends on the match between the frequency and the quadratic error bound, we
design a scheme to automatically adapt the frequency of restart from the
observed decrease of the norm of the gradient mapping. Our algorithm has a
better theoretical bound than previously proposed methods for the adaptation to
the quadratic error bound of the objective. We illustrate the efficiency of the
algorithm on a Lasso problem and on a regularized logistic regression problem
Proximal Quasi-Newton Methods for Regularized Convex Optimization with Linear and Accelerated Sublinear Convergence Rates
In [19], a general, inexact, efficient proximal quasi-Newton algorithm for
composite optimization problems has been proposed and a sublinear global
convergence rate has been established. In this paper, we analyze the
convergence properties of this method, both in the exact and inexact setting,
in the case when the objective function is strongly convex. We also investigate
a practical variant of this method by establishing a simple stopping criterion
for the subproblem optimization. Furthermore, we consider an accelerated
variant, based on FISTA [1], to the proximal quasi-Newton algorithm. A similar
accelerated method has been considered in [7], where the convergence rate
analysis relies on very strong impractical assumptions. We present a modified
analysis while relaxing these assumptions and perform a practical comparison of
the accelerated proximal quasi- Newton algorithm and the regular one. Our
analysis and computational results show that acceleration may not bring any
benefit in the quasi-Newton setting
The proximal point method revisited
In this short survey, I revisit the role of the proximal point method in
large scale optimization. I focus on three recent examples: a proximally guided
subgradient method for weakly convex stochastic approximation, the prox-linear
algorithm for minimizing compositions of convex functions and smooth maps, and
Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New
Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry
We provide a comprehensive study of the convergence of forward-backward
algorithm under suitable geometric conditions leading to fast rates. We present
several new results and collect in a unified view a variety of results
scattered in the literature, often providing simplified proofs. Novel
contributions include the analysis of infinite dimensional convex minimization
problems, allowing the case where minimizers might not exist. Further, we
analyze the relation between different geometric conditions, and discuss novel
connections with a priori conditions in linear inverse problems, including
source conditions, restricted isometry properties and partial smoothness
Linear convergence of first order methods for non-strongly convex optimization
The standard assumption for proving linear convergence of first order methods
for smooth convex optimization is the strong convexity of the objective
function, an assumption which does not hold for many practical applications. In
this paper, we derive linear convergence rates of several first order methods
for solving smooth non-strongly convex constrained optimization problems, i.e.
involving an objective function with a Lipschitz continuous gradient that
satisfies some relaxed strong convexity condition. In particular, in the case
of smooth constrained convex optimization, we provide several relaxations of
the strong convexity conditions and prove that they are sufficient for getting
linear convergence for several first order methods such as projected gradient,
fast gradient and feasible descent methods. We also provide examples of
functional classes that satisfy our proposed relaxations of strong convexity
conditions. Finally, we show that the proposed relaxed strong convexity
conditions cover important applications ranging from solving linear systems,
Linear Programming, and dual formulations of linearly constrained convex
problems.Comment: 36 pages, 4 figure
- …