22 research outputs found
Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data
We focus on analyzing the classical stochastic projected gradient methods
under a general dependent data sampling scheme for constrained smooth nonconvex
optimization. We show the worst-case rate of convergence
and complexity for achieving an
-near stationary point in terms of the norm of the gradient of
Moreau envelope and gradient mapping. While classical convergence guarantee
requires i.i.d. data sampling from the target distribution, we only require a
mild mixing condition of the conditional distribution, which holds for a wide
class of Markov chain sampling algorithms. This improves the existing
complexity for the constrained smooth nonconvex optimization with dependent
data from to with a
significantly simpler analysis. We illustrate the generality of our approach by
deriving convergence results with dependent data for stochastic proximal
gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic
gradient algorithm with heavy ball momentum. As an application, we obtain first
online nonnegative matrix factorization algorithms for dependent data based on
stochastic projected gradient methods with adaptive step sizes and optimal rate
of convergence.Comment: 32 pages, 1 figure, 1 tabl
Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization
We propose a new randomized coordinate descent method for a convex
optimization template with broad applications. Our analysis relies on a novel
combination of four ideas applied to the primal-dual gap function: smoothing,
acceleration, homotopy, and coordinate descent with non-uniform sampling. As a
result, our method features the first convergence rate guarantees among the
coordinate descent methods, that are the best-known under a variety of common
structure assumptions on the template. We provide numerical evidence to support
the theoretical results with a comparison to state-of-the-art algorithms.Comment: NIPS 201
Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions
Machine learning approaches relying on such criteria as adversarial
robustness or multi-agent settings have raised the need for solving
game-theoretic equilibrium problems. Of particular relevance to these
applications are methods targeting finite-sum structure, which generically
arises in empirical variants of learning problems in these contexts. Further,
methods with computable approximation errors are highly desirable, as they
provide verifiable exit criteria. Motivated by these applications, we study
finite-sum monotone inclusion problems, which model broad classes of
equilibrium problems. Our main contributions are variants of the classical
Halpern iteration that employ variance reduction to obtain improved complexity
guarantees in which component operators in the finite sum are ``on
average'' either cocoercive or Lipschitz continuous and monotone, with
parameter . The resulting oracle complexity of our methods, which provide
guarantees for the last iterate and for a (computable) operator norm residual,
is , which improves
upon existing methods by a factor up to . This constitutes the first
variance reduction-type result for general finite-sum monotone inclusions and
for more specific problems such as convex-concave optimization when operator
norm residual is the optimality measure. We further argue that, up to
poly-logarithmic factors, this complexity is unimprovable in the monotone
Lipschitz setting; i.e., the provided result is near-optimal
Beyond the Golden Ratio for Variational Inequality Algorithms
We improve the understanding of the , which
solves monotone variational inequalities (VI) and convex-concave min-max
problems via the distinctive feature of adapting the step sizes to the local
Lipschitz constants. Adaptive step sizes not only eliminate the need to pick
hyperparameters, but they also remove the necessity of global Lipschitz
continuity and can increase from one iteration to the next.
We first establish the equivalence of this algorithm with popular VI methods
such as reflected gradient, Popov or optimistic gradient descent-ascent in the
unconstrained case with constant step sizes. We then move on to the constrained
setting and introduce a new analysis that allows to use larger step sizes, to
complete the bridge between the golden ratio algorithm and the existing
algorithms in the literature. Doing so, we actually eliminate the link between
the golden ratio and the algorithm. Moreover, we improve
the adaptive version of the algorithm, first by removing the maximum step size
hyperparameter (an artifact from the analysis) to improve the complexity bound,
and second by adjusting it to nonmonotone problems with weak Minty solutions,
with superior empirical performance
On the convergence of stochastic primal-dual hybrid gradient
In this paper, we analyze the recently proposed stochastic primal-dual hybrid
gradient (SPDHG) algorithm and provide new theoretical results. In particular,
we prove almost sure convergence of the iterates to a solution and linear
convergence with standard step sizes, independent of strong convexity
constants. Our assumption for linear convergence is metric subregularity, which
is satisfied for smooth and strongly convex problems in addition to many
nonsmooth and/or nonstrongly convex problems, such as linear programs, Lasso,
and support vector machines. In the general convex case, we prove optimal
sublinear rates for the ergodic sequence, without bounded domain assumptions.
We also provide numerical evidence showing that SPDHG with standard step sizes
shows favorable and robust practical performance against its specialized
strongly convex variant SPDHG- and other state-of-the-art algorithms
including variance reduction methods and stochastic dual coordinate ascent