Search CORE

77,552 research outputs found

On the linear convergence of the stochastic gradient method with constant step-size

Author: Cevher Volkan
Vu Bang Cong
Publication venue
Publication date: 17/06/2018
Field of study

The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size

\gamma

(SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this necessary is violated up to a additive perturbation

\sigma

, we show that both the projected stochastic gradient method using a constant step-size (PSGM-CS) and the proximal stochastic gradient method exhibit linear convergence to a noise dominated region, whose distance to the optimal solution is proportional to

\gamma \sigma

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

Author: Dresdner Gideon
Freund Robert M.
Ghaoui Laurent El
Locatello Francesco
Négiar Geoffrey
Pedregosa Fabian
Tsai Alicia
Publication venue
Publication date: 26/06/2020
Field of study

We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.Comment: To appear in the Proceedings of the 37th International Conference on Machine Learning, 2020. Main text: 9 pages, 1 figure. Fixes previously found erro

arXiv.org e-Print Archive

Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

Author: Chen Guo
Hu Jinhui
Li Huaqing
Publication venue
Publication date: 20/09/2023
Field of study

Decentralized Byzantine-resilient stochastic gradient algorithms resolve efficiently large-scale optimization problems in adverse conditions, such as malfunctioning agents, software bugs, and cyber attacks. This paper targets on handling a class of generic composite optimization problems over multi-agent cyberphysical systems (CPSs), with the existence of an unknown number of Byzantine agents. Based on the proximal mapping method, two variance-reduced (VR) techniques, and a norm-penalized approximation strategy, we propose a decentralized Byzantine-resilient and proximal-gradient algorithmic framework, dubbed Prox-DBRO-VR, which achieves an optimization and control goal using only local computations and communications. To reduce asymptotically the variance generated by evaluating the noisy stochastic gradients, we incorporate two localized variance-reduced techniques (SAGA and LSVRG) into Prox-DBRO-VR, to design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. Via analyzing the contraction relationships among the gradient-learning error, robust consensus condition, and optimal gap, the theoretical result demonstrates that both Prox-DBRO-SAGA and Prox-DBRO-LSVRG, with a well-designed constant (resp., decaying) step-size, converge linearly (resp., sub-linearly) inside an error ball around the optimal solution to the optimization problem under standard assumptions. The trade-offs between the convergence accuracy and the number of Byzantine agents in both linear and sub-linear cases are characterized. In simulation, the effectiveness and practicability of the proposed algorithms are manifested via resolving a sparse machine-learning problem over multi-agent CPSs under various Byzantine attacks.Comment: 14 pages, 0 figure

arXiv.org e-Print Archive

Minimizing Finite Sums with the Stochastic Average Gradient

Author: Bach Francis
Roux Nicolas Le
Schmidt Mark
Publication venue
Publication date: 10/05/2016
Field of study

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated literature follow and discussion of subsequent work, additional Lemma showing the validity of one of the formulas, somewhat simplified presentation of Lyapunov bound, included code needed for checking proofs rather than the polynomials generated by the code, added error regions to the numerical experiment

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hybrid Deterministic-Stochastic Methods for Data Fitting

Author: Kumar S.
Mark Schmidt
Michael P. Friedlander
Nedic A.
Sang E.
Vishwanathan S. V. N.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2011
Field of study

Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum. These methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.Comment: 26 pages. Revised proofs of Theorems 2.6 and 3.1, results unchange

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Semistochastic Quadratic Bound Methods

Author: Aravkin Aleksandr Y.
Choromanska Anna
Jebara Tony
Kanevsky Dimitri
Publication venue
Publication date: 17/02/2014
Field of study

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.Comment: 11 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

FROST -- Fast row-stochastic optimization with uncoordinated step-sizes

Author: Khan Usman A.
Xi Chenguang
Xin Ran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2018
Field of study

In this paper, we discuss distributed optimization over directed graphs, where doubly-stochastic weights cannot be constructed. Most of the existing algorithms overcome this issue by applying push-sum consensus, which utilizes column-stochastic weights. The formulation of column-stochastic weights requires each agent to know (at least) its out-degree, which may be impractical in e.g., broadcast-based communication protocols. In contrast, we describe FROST (Fast Row-stochastic-Optimization with uncoordinated STep-sizes), an optimization algorithm applicable to directed graphs that does not require the knowledge of out-degrees; the implementation of which is straightforward as each agent locally assigns weights to the incoming information and locally chooses a suitable step-size. We show that FROST converges linearly to the optimal solution for smooth and strongly-convex functions given that the largest step-size is positive and sufficiently small.Comment: Submitted for journal publication, currently under revie

arXiv.org e-Print Archive

Directory of Open Access Journals