Search CORE

102 research outputs found

Accelerating Stochastic Recursive and Semi-stochastic Gradient Methods with Adaptive Barzilai-Borwein Step Sizes

Author: Peng Zheng
Wang Jiangshan
Yang Yiming
Publication venue
Publication date: 22/10/2023
Field of study

The mini-batch versions of StochAstic Recursive grAdient algoritHm and Semi-Stochastic Gradient Descent method, employed the random Barzilai-Borwein step sizes (shorted as MB-SARAH-RBB and mS2GD-RBB), have surged into prominence through timely step size sequence. Inspired by modern adaptors and variance reduction techniques, we propose two new variant rules in the paper, referred to as RHBB and RHBB+, thereby leading to four algorithms MB-SARAH-RHBB, MB-SARAH-RHBB+, mS2GD-RHBB and mS2GD-RHBB+ respectively. RHBB+ is an enhanced version that additionally incorporates the importance sampling technique. They are aggressive in updates, robust in performance and self-adaptive along iterative periods. We analyze the flexible convergence structures and the corresponding complexity bounds in strongly convex cases. Comprehensive tuning guidance is theoretically provided for reference in practical implementations. Experiments show that the proposed methods consistently outperform the original and various state-of-the-art methods on frequently tested data sets. In particular, tests on the RHBB+ verify the efficacy of applying the importance sampling technique to the step size level. Numerous explorations display the promising scalability of our iterative adaptors.Comment: 44 pages, 33 figure

arXiv.org e-Print Archive

An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms

Author: Cheng James
Jin Chenhan
Meng Wei
Xie Binghui
Zhou Kaiwen
Publication venue
Publication date: 07/10/2023
Field of study

Stochastic variance reduced methods have shown strong performance in solving finite-sum problems. However, these methods usually require the users to manually tune the step-size, which is time-consuming or even infeasible for some large-scale optimization tasks. To overcome the problem, we propose and analyze several novel adaptive variants of the popular SAGA algorithm. Eventually, we design a variant of Barzilai-Borwein step-size which is tailored for the incremental gradient method to ensure memory efficiency and fast convergence. We establish its convergence guarantees under general settings that allow non-Euclidean norms in the definition of smoothness and the composite objectives, which cover a broad range of applications in machine learning. We improve the analysis of SAGA to support non-Euclidean norms, which fills the void of existing work. Numerical experiments on standard datasets demonstrate a competitive performance of the proposed algorithm compared with existing variance-reduced methods and their adaptive variants

arXiv.org e-Print Archive

Stochastic Steffensen method

Author: Lai Zehua
Lim Lek-Heng
Zhao Minda
Publication venue
Publication date: 28/11/2022
Field of study

Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to

1+\sqrt{2} \approx 2.414

. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.Comment: 22 pages, 3 figure

arXiv.org e-Print Archive

Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization

Author: Dai Y.H.
Liu X.W.
Sun Jie
Yu T.
Publication venue
Publication date: 01/01/2022
Field of study

We study the problem of minimizing the sum of two functions. The first function is the average of a large number of nonconvex component functions and the second function is a convex (possibly nonsmooth) function that admits a simple proximal mapping. With a diagonal Barzilai-Borwein stepsize for updating the metric, we propose a variable metric proximal stochastic variance reduced gradient method in the mini-batch setting, named VM-SVRG. It is proved that VM-SVRG converges sublinearly to a stationary point in expectation. We further suggest a variant of VM-SVRG to achieve linear convergence rate in expectation for nonconvex problems satisfying the proximal Polyak-Lojasiewicz inequality. The complexity of VM-SVRG is lower than that of the proximal gradient method and proximal stochastic gradient method, and is the same as the proximal stochastic variance reduced gradient method. Numerical experiments are conducted on standard data sets. Comparisons with other advanced proximal stochastic gradient methods show the efficiency of the proposed method

espace@Curtin

SLiSeS: Subsampled Line Search Spectral Gradient Method for Finite Sums

Author: Bellavia Stefania
Jerinkić Nataša Krklec
Krejić Nataša
Raydan Marcos
Publication venue
Publication date: 12/06/2023
Field of study

The spectral gradient method is known to be a powerful low-cost tool for solving large-scale optimization problems. In this paper, our goal is to exploit its advantages in the stochastic optimization framework, especially in the case of mini-batch subsampling that is often used in big data settings. To allow the spectral coefficient to properly explore the underlying approximate Hessian spectrum, we keep the same subsample for several iterations before subsampling again. We analyze the required algorithmic features and the conditions for almost sure convergence, and present initial numerical results that show the advantages of the proposed method

arXiv.org e-Print Archive