102 research outputs found
Accelerating Stochastic Recursive and Semi-stochastic Gradient Methods with Adaptive Barzilai-Borwein Step Sizes
The mini-batch versions of StochAstic Recursive grAdient algoritHm and
Semi-Stochastic Gradient Descent method, employed the random Barzilai-Borwein
step sizes (shorted as MB-SARAH-RBB and mS2GD-RBB), have surged into prominence
through timely step size sequence. Inspired by modern adaptors and variance
reduction techniques, we propose two new variant rules in the paper, referred
to as RHBB and RHBB+, thereby leading to four algorithms MB-SARAH-RHBB,
MB-SARAH-RHBB+, mS2GD-RHBB and mS2GD-RHBB+ respectively. RHBB+ is an enhanced
version that additionally incorporates the importance sampling technique. They
are aggressive in updates, robust in performance and self-adaptive along
iterative periods. We analyze the flexible convergence structures and the
corresponding complexity bounds in strongly convex cases. Comprehensive tuning
guidance is theoretically provided for reference in practical implementations.
Experiments show that the proposed methods consistently outperform the original
and various state-of-the-art methods on frequently tested data sets. In
particular, tests on the RHBB+ verify the efficacy of applying the importance
sampling technique to the step size level. Numerous explorations display the
promising scalability of our iterative adaptors.Comment: 44 pages, 33 figure
An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms
Stochastic variance reduced methods have shown strong performance in solving
finite-sum problems. However, these methods usually require the users to
manually tune the step-size, which is time-consuming or even infeasible for
some large-scale optimization tasks. To overcome the problem, we propose and
analyze several novel adaptive variants of the popular SAGA algorithm.
Eventually, we design a variant of Barzilai-Borwein step-size which is tailored
for the incremental gradient method to ensure memory efficiency and fast
convergence. We establish its convergence guarantees under general settings
that allow non-Euclidean norms in the definition of smoothness and the
composite objectives, which cover a broad range of applications in machine
learning. We improve the analysis of SAGA to support non-Euclidean norms, which
fills the void of existing work. Numerical experiments on standard datasets
demonstrate a competitive performance of the proposed algorithm compared with
existing variance-reduced methods and their adaptive variants
Stochastic Steffensen method
Is it possible for a first-order method, i.e., only first derivatives
allowed, to be quadratically convergent? For univariate loss functions, the
answer is yes -- the Steffensen method avoids second derivatives and is still
quadratically convergent like Newton method. By incorporating an optimal step
size we can even push its convergence order beyond quadratic to . While such high convergence orders are a pointless overkill for
a deterministic algorithm, they become rewarding when the algorithm is
randomized for problems of massive sizes, as randomization invariably
compromises convergence speed. We will introduce two adaptive learning rates
inspired by the Steffensen method, intended for use in a stochastic
optimization setting and requires no hyperparameter tuning aside from batch
size. Extensive experiments show that they compare favorably with several
existing first-order methods. When restricted to a quadratic objective, our
stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that
this is not true for SGD or SLBFGS -- and thus we may also view our methods as
a generalization of randomized Kaczmarz to arbitrary objectives.Comment: 22 pages, 3 figure
Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization
We study the problem of minimizing the sum of two functions. The first function is the average of a large number of nonconvex component functions and the second function is a convex (possibly nonsmooth) function that admits a simple proximal mapping. With a diagonal Barzilai-Borwein stepsize for updating the metric, we propose a variable metric proximal stochastic variance reduced gradient method in the mini-batch setting, named VM-SVRG. It is proved that VM-SVRG converges sublinearly to a stationary point in expectation. We further suggest a variant of VM-SVRG to achieve linear convergence rate in expectation for nonconvex problems satisfying the proximal Polyak-Lojasiewicz inequality. The complexity of VM-SVRG is lower than that of the proximal gradient method and proximal stochastic gradient method, and is the same as the proximal stochastic variance reduced gradient method. Numerical experiments are conducted on standard data sets. Comparisons with other advanced proximal stochastic gradient methods show the efficiency of the proposed method
SLiSeS: Subsampled Line Search Spectral Gradient Method for Finite Sums
The spectral gradient method is known to be a powerful low-cost tool for
solving large-scale optimization problems. In this paper, our goal is to
exploit its advantages in the stochastic optimization framework, especially in
the case of mini-batch subsampling that is often used in big data settings. To
allow the spectral coefficient to properly explore the underlying approximate
Hessian spectrum, we keep the same subsample for several iterations before
subsampling again. We analyze the required algorithmic features and the
conditions for almost sure convergence, and present initial numerical results
that show the advantages of the proposed method
- …