Search CORE

67 research outputs found

A Simple Practical Accelerated Method for Finite Sums

Author: Defazio Aaron
Publication venue
Publication date: 27/10/2016
Field of study

We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method. Our method achieves an accelerated convergence rate on strongly convex smooth problems. Our method has only one parameter (a step size), and is radically simpler than other accelerated methods for finite sums. Additionally it can be applied when the terms are non-smooth, yielding a method applicable in many areas where operator splitting methods would traditionally be applied

arXiv.org e-Print Archive

Linear Convergence of Cyclic SAGA

Author: Park Youngsuk
Ryu Ernest K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/01/2020
Field of study

In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of differentiable convex functions by cyclically accessing their gradients. Even though the theory of stochastic algorithms is more mature than that of cyclic counterparts in general, practitioners often prefer cyclic algorithms. We prove C-SAGA converges linearly under the standard assumptions. Then, we compare the rate of convergence with the full gradient method, (stochastic) SAGA, and incremental aggregated gradient (IAG), theoretically and experimentally.Comment: Published in Optimization Letter

arXiv.org e-Print Archive

ASVRG: Accelerated Proximal SVRG

Author: Cheng James
Jiao Licheng
Jin Yufei
Ren Yan
Shang Fanhua
Zhou Kaiwen
Publication venue
Publication date: 17/11/2018
Field of study

This paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick. Unlike most existing accelerated stochastic variance reduction methods such as Katyusha, ASVRG has only one additional variable and one momentum parameter. Thus, ASVRG is much simpler than those methods, and has much lower per-iteration complexity. We prove that ASVRG achieves the best known oracle complexities for both strongly convex and non-strongly convex objectives. In addition, we extend ASVRG to mini-batch and non-smooth settings. We also empirically verify our theoretical results and show that the performance of ASVRG is comparable with, and sometimes even better than that of the state-of-the-art stochastic methods.Comment: 32 pages, 3 figure

arXiv.org e-Print Archive

The proximal point method revisited

Author: Drusvyatskiy Dmitriy
Publication venue
Publication date: 16/12/2017
Field of study

In this short survey, I revisit the role of the proximal point method in large scale optimization. I focus on three recent examples: a proximally guided subgradient method for weakly convex stochastic approximation, the prox-linear algorithm for minimizing compositions of convex functions and smooth maps, and Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New

arXiv.org e-Print Archive

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Author: Bottou Léon
Defazio Aaron
Publication venue
Publication date: 20/11/2019
Field of study

The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why

arXiv.org e-Print Archive

Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication

Author: Mokhtari Aryan
Qian Hui
Shen Zebang
Zhao Peilin
Zhou Tengfei
Publication venue
Publication date: 24/05/2018
Field of study

Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (i) converges geometrically with a rate linearly depending on the problem condition number, and (ii) can be implemented using sparse communication only. Additionally, DSBA handles learning problems like AUC-maximization which cannot be tackled efficiently in the decentralized setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.Comment: Accepted to ICML 201

arXiv.org e-Print Archive

First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

Author: Mokhtari Aryan
Ribeiro Alejandro
Publication venue
Publication date: 02/09/2017
Field of study

This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically -- e.g., scaling by a factor of two -- and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. The gains are specific to the choice of method. When particularized to, e.g., accelerated gradient descent and stochastic variance reduce gradient, the computational cost advantage is a logarithm of the number of training samples. Numerical experiments on various datasets confirm theoretical claims and showcase the gains of using the proposed adaptive sample size scheme

arXiv.org e-Print Archive

Stochastic Nonconvex Optimization with Large Minibatches

Author: Srebro Nathan
Wang Weiran
Publication venue
Publication date: 08/03/2019
Field of study

We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatch stochastic gradient descent, and facilitate better parallelization by allowing larger minibatches.Comment: Accepted by the ALT 201

arXiv.org e-Print Archive

Curvature-Exploiting Acceleration of Elastic Net Computations

Author: Johansson Mikael
Mai Vien V.
Publication venue
Publication date: 24/01/2019
Field of study

This paper introduces an efficient second-order method for solving the elastic net problem. Its key innovation is a computationally efficient technique for injecting curvature information in the optimization process which admits a strong theoretical performance guarantee. In particular, we show improved run time over popular first-order methods and quantify the speed-up in terms of statistical measures of the data matrix. The improved time complexity is the result of an extensive exploitation of the problem structure and a careful combination of second-order information, variance reduction techniques, and momentum acceleration. Beside theoretical speed-up, experimental results demonstrate great practical performance benefits of curvature information, especially for ill-conditioned data sets.Comment: 34 pages, 2 figure

arXiv.org e-Print Archive

Boosting First-order Methods by Shifting Objective: New Schemes with Faster Worst Case Rates

Author: Cheng James
So Anthony Man-Cho
Zhou Kaiwen
Publication venue
Publication date: 25/05/2020
Field of study

We propose a new methodology to design first-order methods for unconstrained strongly convex problems, i.e., to design for a shifted objective function. Several technical lemmas are provided as the building blocks for designing new methods. By shifting objective, the analysis is tightened, which leaves space for faster rates, and also simplified. Following this methodology, we derived several new accelerated schemes for problems that equipped with various first-order oracles, and all of the derived methods have faster worst case convergence rates than their existing counterparts. Experiments on machine learning tasks are conducted to evaluate the new methods.Comment: 27 pages, 7 figure

arXiv.org e-Print Archive