22 research outputs found

    Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data

    Full text link
    We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence O~(t−1/4)\tilde{O}(t^{-1/4}) and complexity O~(ε−4)\tilde{O}(\varepsilon^{-4}) for achieving an ε\varepsilon-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from O~(ε−8)\tilde{O}(\varepsilon^{-8}) to O~(ε−4)\tilde{O}(\varepsilon^{-4}) with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.Comment: 32 pages, 1 figure, 1 tabl

    Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization

    Get PDF
    We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent methods, that are the best-known under a variety of common structure assumptions on the template. We provide numerical evidence to support the theoretical results with a comparison to state-of-the-art algorithms.Comment: NIPS 201

    Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

    Full text link
    Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which nn component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter LL. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is O~(n+nLε−1)\widetilde{\mathcal{O}}( n + \sqrt{n}L\varepsilon^{-1}), which improves upon existing methods by a factor up to n\sqrt{n}. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal

    Beyond the Golden Ratio for Variational Inequality Algorithms

    Full text link
    We improve the understanding of the golden ratio algorithm\textit{golden ratio algorithm}, which solves monotone variational inequalities (VI) and convex-concave min-max problems via the distinctive feature of adapting the step sizes to the local Lipschitz constants. Adaptive step sizes not only eliminate the need to pick hyperparameters, but they also remove the necessity of global Lipschitz continuity and can increase from one iteration to the next. We first establish the equivalence of this algorithm with popular VI methods such as reflected gradient, Popov or optimistic gradient descent-ascent in the unconstrained case with constant step sizes. We then move on to the constrained setting and introduce a new analysis that allows to use larger step sizes, to complete the bridge between the golden ratio algorithm and the existing algorithms in the literature. Doing so, we actually eliminate the link between the golden ratio 1+52\frac{1+\sqrt{5}}{2} and the algorithm. Moreover, we improve the adaptive version of the algorithm, first by removing the maximum step size hyperparameter (an artifact from the analysis) to improve the complexity bound, and second by adjusting it to nonmonotone problems with weak Minty solutions, with superior empirical performance

    On the convergence of stochastic primal-dual hybrid gradient

    Full text link
    In this paper, we analyze the recently proposed stochastic primal-dual hybrid gradient (SPDHG) algorithm and provide new theoretical results. In particular, we prove almost sure convergence of the iterates to a solution and linear convergence with standard step sizes, independent of strong convexity constants. Our assumption for linear convergence is metric subregularity, which is satisfied for smooth and strongly convex problems in addition to many nonsmooth and/or nonstrongly convex problems, such as linear programs, Lasso, and support vector machines. In the general convex case, we prove optimal sublinear rates for the ergodic sequence, without bounded domain assumptions. We also provide numerical evidence showing that SPDHG with standard step sizes shows favorable and robust practical performance against its specialized strongly convex variant SPDHG-μ\mu and other state-of-the-art algorithms including variance reduction methods and stochastic dual coordinate ascent
    corecore