77,552 research outputs found

    On the linear convergence of the stochastic gradient method with constant step-size

    Get PDF
    The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size γ\gamma (SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this necessary is violated up to a additive perturbation σ\sigma, we show that both the projected stochastic gradient method using a constant step-size (PSGM-CS) and the proximal stochastic gradient method exhibit linear convergence to a noise dominated region, whose distance to the optimal solution is proportional to γσ\gamma \sigma

    Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

    Full text link
    We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.Comment: To appear in the Proceedings of the 37th International Conference on Machine Learning, 2020. Main text: 9 pages, 1 figure. Fixes previously found erro

    Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

    Full text link
    Decentralized Byzantine-resilient stochastic gradient algorithms resolve efficiently large-scale optimization problems in adverse conditions, such as malfunctioning agents, software bugs, and cyber attacks. This paper targets on handling a class of generic composite optimization problems over multi-agent cyberphysical systems (CPSs), with the existence of an unknown number of Byzantine agents. Based on the proximal mapping method, two variance-reduced (VR) techniques, and a norm-penalized approximation strategy, we propose a decentralized Byzantine-resilient and proximal-gradient algorithmic framework, dubbed Prox-DBRO-VR, which achieves an optimization and control goal using only local computations and communications. To reduce asymptotically the variance generated by evaluating the noisy stochastic gradients, we incorporate two localized variance-reduced techniques (SAGA and LSVRG) into Prox-DBRO-VR, to design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. Via analyzing the contraction relationships among the gradient-learning error, robust consensus condition, and optimal gap, the theoretical result demonstrates that both Prox-DBRO-SAGA and Prox-DBRO-LSVRG, with a well-designed constant (resp., decaying) step-size, converge linearly (resp., sub-linearly) inside an error ball around the optimal solution to the optimization problem under standard assumptions. The trade-offs between the convergence accuracy and the number of Byzantine agents in both linear and sub-linear cases are characterized. In simulation, the effectiveness and practicability of the proposed algorithms are manifested via resolving a sparse machine-learning problem over multi-agent CPSs under various Byzantine attacks.Comment: 14 pages, 0 figure

    Minimizing Finite Sums with the Stochastic Average Gradient

    Get PDF
    We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated literature follow and discussion of subsequent work, additional Lemma showing the validity of one of the formulas, somewhat simplified presentation of Lyapunov bound, included code needed for checking proofs rather than the polynomials generated by the code, added error regions to the numerical experiment

    Hybrid Deterministic-Stochastic Methods for Data Fitting

    Full text link
    Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum. These methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.Comment: 26 pages. Revised proofs of Theorems 2.6 and 3.1, results unchange

    Semistochastic Quadratic Bound Methods

    Full text link
    Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.Comment: 11 pages, 1 figur

    FROST -- Fast row-stochastic optimization with uncoordinated step-sizes

    Full text link
    In this paper, we discuss distributed optimization over directed graphs, where doubly-stochastic weights cannot be constructed. Most of the existing algorithms overcome this issue by applying push-sum consensus, which utilizes column-stochastic weights. The formulation of column-stochastic weights requires each agent to know (at least) its out-degree, which may be impractical in e.g., broadcast-based communication protocols. In contrast, we describe FROST (Fast Row-stochastic-Optimization with uncoordinated STep-sizes), an optimization algorithm applicable to directed graphs that does not require the knowledge of out-degrees; the implementation of which is straightforward as each agent locally assigns weights to the incoming information and locally chooses a suitable step-size. We show that FROST converges linearly to the optimal solution for smooth and strongly-convex functions given that the largest step-size is positive and sufficiently small.Comment: Submitted for journal publication, currently under revie
    corecore