17,008 research outputs found
A unified variance-reduced accelerated gradient method for convex optimization
We propose a novel randomized incremental gradient algorithm, namely,
VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization.
Equipped with a unified step-size policy that adjusts itself to the value of
the condition number, Varag exhibits the unified optimal rates of convergence
for solving smooth convex finite-sum problems directly regardless of their
strong convexity. Moreover, Varag is the first accelerated randomized
incremental gradient method that benefits from the strong convexity of the
data-fidelity term to achieve the optimal linear convergence. It also
establishes an optimal linear rate of convergence for solving a wide class of
problems only satisfying a certain error bound condition rather than strong
convexity. Varag can also be extended to solve stochastic finite-sum problems.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019
Incremental Stochastic Subgradient Algorithms for Convex Optimization
In this paper we study the effect of stochastic errors on two constrained
incremental sub-gradient algorithms. We view the incremental sub-gradient
algorithms as decentralized network optimization algorithms as applied to
minimize a sum of functions, when each component function is known only to a
particular agent of a distributed network. We first study the standard cyclic
incremental sub-gradient algorithm in which the agents form a ring structure
and pass the iterate in a cycle. We consider the method with stochastic errors
in the sub-gradient evaluations and provide sufficient conditions on the
moments of the stochastic errors that guarantee almost sure convergence when a
diminishing step-size is used. We also obtain almost sure bounds on the
algorithm's performance when a constant step-size is used. We then consider
\ram{the} Markov randomized incremental subgradient method, which is a
non-cyclic version of the incremental algorithm where the sequence of computing
agents is modeled as a time non-homogeneous Markov chain. Such a model is
appropriate for mobile networks, as the network topology changes across time in
these networks. We establish the convergence results and error bounds for the
Markov randomized method in the presence of stochastic errors for diminishing
and constant step-sizes, respectively
On the linear convergence of the stochastic gradient method with constant step-size
The strong growth condition (SGC) is known to be a sufficient condition for
linear convergence of the stochastic gradient method using a constant step-size
(SGM-CS). In this paper, we provide a necessary condition, for the
linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this
necessary is violated up to a additive perturbation , we show that both
the projected stochastic gradient method using a constant step-size (PSGM-CS)
and the proximal stochastic gradient method exhibit linear convergence to a
noise dominated region, whose distance to the optimal solution is proportional
to
Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice
We introduce a generic scheme for accelerating gradient-based optimization
methods in the sense of Nesterov. The approach, called Catalyst, builds upon
the inexact accelerated proximal point algorithm for minimizing a convex
objective function, and consists of approximately solving a sequence of
well-chosen auxiliary problems, leading to faster convergence. One of the keys
to achieve acceleration in theory and in practice is to solve these
sub-problems with appropriate accuracy by using the right stopping criterion
and the right warm-start strategy. We give practical guidelines to use Catalyst
and present a comprehensive analysis of its global complexity. We show that
Catalyst applies to a large class of algorithms, including gradient descent,
block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG,
MISO/Finito, and their proximal variants. For all of these methods, we
establish faster rates using the Catalyst acceleration, for strongly convex and
non-strongly convex objectives. We conclude with extensive experiments showing
that acceleration is useful in practice, especially for ill-conditioned
problems.Comment: link to publisher website:
http://jmlr.org/papers/volume18/17-748/17-748.pd
- …