12 research outputs found
New Analysis of Linear Convergence of Gradient-type Methods via Unifying Error Bound Conditions
This paper reveals that a common and central role, played in many error bound
(EB) conditions and a variety of gradient-type methods, is a residual measure
operator. On one hand, by linking this operator with other optimality measures,
we define a group of abstract EB conditions, and then analyze the interplay
between them; on the other hand, by using this operator as an ascent direction,
we propose an abstract gradient-type method, and then derive EB conditions that
are necessary and sufficient for its linear convergence. The former provides a
unified framework that not only allows us to find new connections between many
existing EB conditions, but also paves a way to construct new EB conditions.
The latter allows us to claim the weakest conditions guaranteeing linear
convergence for a number of fundamental algorithms, including the gradient
method, the proximal point algorithm, and the forward-backward splitting
algorithm. In addition, we show linear convergence for the proximal alternating
linearized minimization algorithm under a group of equivalent EB conditions,
which are strictly weaker than the traditional strongly convex condition.
Moreover, by defining a new EB condition, we show Q-linear convergence of
Nesterov's accelerated forward-backward algorithm without strong convexity.
Finally, we verify EB conditions for a class of dual objective functions.Comment: 40 papes; incorporating the referee's comments, the presentation has
been further improve
Inertial Proximal Incremental Aggregated Gradient Method
In this paper, we introduce an inertial version of the Proximal Incremental
Aggregated Gradient method (PIAG) for minimizing the sum of smooth convex
component functions and a possibly nonsmooth convex regularization function.
Theoretically, we show that the inertial Proximal Incremental Aggregated
Gradiend (iPIAG) method enjoys a global linear convergence under a quadratic
growth condition, which is strictly weaker than strong convexity, provided that
the stepsize is not larger than a constant. Moreover, we present two numerical
expreiments which demonstrate that iPIAG outperforms the original PIAG.Comment: 17 pages, 3 figure
Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than
In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS)
algorithm for solving a family of non-smooth problems that is composed of a
non-smooth term with an explicit max-structure and a smooth term or a simple
non-smooth term whose proximal mapping is easy to compute. The best known
iteration complexity for solving such non-smooth optimization problems is
without any assumption on the strong convexity. In this work,
we will show that the proposed HOPS achieved a lower iteration complexity of
\footnote{ suppresses a
logarithmic factor.} with capturing the local sharpness of the
objective function around the optimal solutions. To the best of our knowledge,
this is the lowest iteration complexity achieved so far for the considered
non-smooth optimization problems without strong convexity assumption. The HOPS
algorithm employs Nesterov's smoothing technique and Nesterov's accelerated
gradient method and runs in stages, which gradually decreases the smoothing
parameter in a stage-wise manner until it yields a sufficiently good
approximation of the original function. We show that HOPS enjoys a linear
convergence for many well-known non-smooth problems (e.g., empirical risk
minimization with a piece-wise linear loss function and norm
regularizer, finding a point in a polyhedron, cone programming, etc).
Experimental results verify the effectiveness of HOPS in comparison with
Nesterov's smoothing algorithm and the primal-dual style of first-order
methods.Comment: This is a long version of the paper accepted by NIPS 201
Global Complexity Analysis of Inexact Successive Quadratic Approximation methods for Regularized Optimization under Mild Assumptions
Successive quadratic approximations (SQA) are numerically efficient for
minimizing the sum of a smooth function and a convex function. The iteration
complexity of inexact SQA methods has been analyzed recently. In this paper, we
present an algorithmic framework of inexact SQA methods with four types of line
searches, and analyze its global complexity under milder assumptions. First, we
show its well-definedness and some decreasing properties. Second, under the
quadratic growth condition and a uniform positive lower bound condition on
stepsizes, we show that the function value sequence and the iterate sequence
are linearly convergent. Moreover, we obtain a o(1/k) complexity without the
quadratic growth condition, improving existing O(1/k) complexity results. At
last, we show that a local gradient-Lipschitz-continuity condition could
guarantee a uniform positive lower bound for the stepsizes
Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence under Bregman Distance Growth Conditions
We introduce a unified algorithmic framework, called proximal-like
incremental aggregated gradient (PLIAG) method, for minimizing the sum of a
convex function that consists of additive relatively smooth convex components
and a proper lower semi-continuous convex regularization function, over an
abstract feasible set whose geometry can be captured by using the domain of a
Legendre function. The PLIAG method includes many existing algorithms in the
literature as special cases such as the proximal gradient method, the Bregman
proximal gradient method (also called NoLips algorithm), the incremental
aggregated gradient method, the incremental aggregated proximal method, and the
proximal incremental aggregated gradient method. It also includes some novel
interesting iteration schemes. First we show the PLIAG method is globally
sublinearly convergent without requiring a growth condition, which extends the
sublinear convergence result for the proximal gradient algorithm to incremental
aggregated type first order methods. Then by embedding a so-called Bregman
distance growth condition into a descent-type lemma to construct a special
Lyapunov function, we show that the PLIAG method is globally linearly
convergent in terms of both function values and Bregman distances to the
optimal solution set, provided that the step size is not greater than some
positive constant. These convergence results derived in this paper are all
established beyond the standard assumptions in the literature (i.e., without
requiring the strong convexity and the Lipschitz gradient continuity of the
smooth part of the objective). When specialized to many existing algorithms,
our results recover or supplement their convergence results under strictly
weaker conditions.Comment: 28 page
Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions
Error bound conditions (EBC) are properties that characterize the growth of
an objective function when a point is moved away from the optimal set. They
have recently received increasing attention in the field of optimization for
developing optimization algorithms with fast convergence. However, the studies
of EBC in statistical learning are hitherto still limited. The main
contributions of this paper are two-fold. First, we develop fast and
intermediate rates of empirical risk minimization (ERM) under EBC for risk
minimization with Lipschitz continuous, and smooth convex random functions.
Second, we establish fast and intermediate rates of an efficient stochastic
approximation (SA) algorithm for risk minimization with Lipschitz continuous
random functions, which requires only one pass of samples and adapts to
EBC. For both approaches, the convergence rates span a full spectrum between
and depending on the power
constant in EBC, and could be even faster than in special cases for
ERM. Moreover, these convergence rates are automatically adaptive without using
any knowledge of EBC. Overall, this work not only strengthens the understanding
of ERM for statistical learning but also brings new fast stochastic algorithms
for solving a broad range of statistical learning problems
Faster Subgradient Methods for Functions with H\"olderian Growth
The purpose of this manuscript is to derive new convergence results for
several subgradient methods applied to minimizing nonsmooth convex functions
with H\"olderian growth. The growth condition is satisfied in many applications
and includes functions with quadratic growth and weakly sharp minima as special
cases. To this end there are three main contributions. First, for a constant
and sufficiently small stepsize, we show that the subgradient method achieves
linear convergence up to a certain region including the optimal set, with error
of the order of the stepsize. Second, if appropriate problem parameters are
known, we derive a decaying stepsize which obtains a much faster convergence
rate than is suggested by the classical result for the
subgradient method. Thirdly we develop a novel "descending stairs" stepsize
which obtains this faster convergence rate and also obtains linear convergence
for the special case of weakly sharp functions. We also develop an adaptive
variant of the "descending stairs" stepsize which achieves the same convergence
rate without requiring an error bound constant which is difficult to estimate
in practice.Comment: 50 pages. First revised version (under submission to Math
Programming
A Variational Approach on Level sets and Linear Convergence of Variable Bregman Proximal Gradient Method for Nonconvex Optimization Problems
We develop a new variational approach on level sets aiming towards
convergence rate analysis of a variable Bregman proximal gradient (VBPG) method
for a broad class of nonsmooth and nonconvex optimization problems. With this
new approach, we are able to extend the concepts of Bregman proximal mapping
and their corresponding Bregman proximal envelops, Bregman proximal gap
function to nonconvex setting. Properties of these mappings and functions are
examined. An aim of this work is to provide a solid foundation on which further
design and analysis of VBPG for more general nonconvex optimization problems
are possible. Another aim is to provide a unified theory on linear convergence
of VBPG with a particular interest towards proximal gradient methods. Centrol
to our analysis for achieving the above goals is an error bound in terms of
level sets and subdifferentials (level-set subdifferential error bound) along
with its links to other level-set error bounds. As a consequence, we have
established a number of positive results. These newly established results not
only enable us to show that any accumulation of the sequence generated by VBPG
is at least a critical point of the limiting subdifferential or even a critical
point of the proximal subdifferential with a fixed Bregman function in each
iteration, but also provide a fresh perspective that allows us to explore
inner-connections among many known sufficient conditions for linear convergence
of various first-order methods. Along the way, we are able to derive a number
of verifiable conditions for level-set error bounds to hold, obtain linear
convergence of VBPG, and derive necessary conditions and sufficient conditions
for linear convergence relative to a level set for nonsmooth and nonconvex
optimization problems
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
In decentralized optimization, it is common algorithmic practice to have
nodes interleave (local) gradient descent iterations with gossip (i.e.
averaging over the network) steps. Motivated by the training of large-scale
machine learning models, it is also increasingly common to require that
messages be {\em lossy compressed} versions of the local parameters. In this
paper, we show that, in such compressed decentralized optimization settings,
there are benefits to having {\em multiple} gossip steps between subsequent
gradient iterations, even when the cost of doing so is appropriately accounted
for e.g. by means of reducing the precision of compressed information. In
particular, we show that having gradient iterations
{with constant step size} - and gossip steps
between every pair of these iterations - enables convergence to within
of the optimal value for smooth non-convex objectives satisfying
Polyak-\L{}ojasiewicz condition. This result also holds for smooth strongly
convex objectives. To our knowledge, this is the first work that derives
convergence results for nonconvex optimization under arbitrary communication
compression
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent
Geometric median (\textsc{Gm}) is a classical method in statistics for
achieving a robust estimation of the uncorrupted data; under gross corruption,
it achieves the optimal breakdown point of 0.5. However, its computational
complexity makes it infeasible for robustifying stochastic gradient descent
(SGD) for high-dimensional optimization problems. In this paper, we show that
by applying \textsc{Gm} to only a judiciously chosen block of coordinates at a
time and using a memory mechanism, one can retain the breakdown point of 0.5
for smooth non-convex problems, with non-asymptotic convergence rates
comparable to the SGD with \textsc{Gm}