7 research outputs found
Randomized Stochastic Variance-Reduced Methods for Multi-Task Stochastic Bilevel Optimization
In this paper, we consider non-convex stochastic bilevel optimization (SBO)
problems that have many applications in machine learning. Although numerous
studies have proposed stochastic algorithms for solving these problems, they
are limited in two perspectives: (i) their sample complexities are high, which
do not match the state-of-the-art result for non-convex stochastic
optimization; (ii) their algorithms are tailored to problems with only one
lower-level problem. When there are many lower-level problems, it could be
prohibitive to process all these lower-level problems at each iteration. To
address these limitations, this paper proposes fast randomized stochastic
algorithms for non-convex SBO problems. First, we present a stochastic method
for non-convex SBO with only one lower problem and establish its sample
complexity of for finding an -stationary point
under Lipschitz continuous conditions of stochastic oracles, matching the lower
bound for stochastic smooth non-convex optimization. Second, we present a
randomized stochastic method for non-convex SBO with lower level problems
(multi-task SBO) by processing a constant number of lower problems at each
iteration, and establish its sample complexity no worse than ,
which could be a better complexity than that of simply processing all lower
problems at each iteration. Lastly, we establish even faster convergence
results for gradient-dominant functions. To the best of our knowledge, this is
the first work considering multi-task SBO and developing state-of-the-art
sample complexity results
On Biased Stochastic Gradient Estimation
We present a uniform analysis of biased stochastic gradient methods for
minimizing convex, strongly convex, and non-convex composite objectives, and
identify settings where bias is useful in stochastic gradient estimation. The
framework we present allows us to extend proximal support to biased algorithms,
including SAG and SARAH, for the first time in the convex setting. We also use
our framework to develop a new algorithm, Stochastic Average Recursive GradiEnt
(SARGE), that achieves the oracle complexity lower-bound for non-convex,
finite-sum objectives and requires strictly fewer calls to a stochastic
gradient oracle per iteration than SVRG and SARAH. We support our theoretical
results with numerical experiments that demonstrate the benefits of certain
biased gradient estimators.Comment: journal version, 35 page
Accelerating Variance-Reduced Stochastic Gradient Methods
Variance reduction is a crucial tool for improving the slow convergence of
stochastic gradient descent. Only a few variance-reduced methods, however, have
yet been shown to directly benefit from Nesterov's acceleration techniques to
match the convergence rates of accelerated gradient methods. Such approaches
rely on "negative momentum", a technique for further variance reduction that is
generally specific to the SVRG gradient estimator. In this work, we show that
negative momentum is unnecessary for acceleration and develop a universal
acceleration framework that allows all popular variance-reduced methods to
achieve accelerated convergence rates. The constants appearing in these rates,
including their dependence on the number of functions , scale with the
mean-squared-error and bias of the gradient estimator. In a series of numerical
experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using
our framework significantly outperform non-accelerated versions and compare
favourably with algorithms using negative momentum.Comment: 33 page
ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization
We propose a new stochastic first-order algorithmic framework to solve
stochastic composite nonconvex optimization problems that covers both
finite-sum and expectation settings. Our algorithms rely on the SARAH estimator
introduced in (Nguyen et al, 2017) and consist of two steps: a proximal
gradient and an averaging step making them different from existing nonconvex
proximal-type algorithms. The algorithms only require an average smoothness
assumption of the nonconvex objective term and additional bounded variance
assumption if applied to expectation problems. They work with both constant and
adaptive step-sizes, while allowing single sample and mini-batches. In all
these cases, we prove that our algorithms can achieve the best-known complexity
bounds. One key step of our methods is new constant and adaptive step-sizes
that help to achieve desired complexity bounds while improving practical
performance. Our constant step-size is much larger than existing methods
including proximal SVRG schemes in the single sample case. We also specify the
algorithm to the non-composite case that covers existing state-of-the-arts in
terms of complexity bounds. Our update also allows one to trade-off between
step-sizes and mini-batch sizes to improve performance. We test the proposed
algorithms on two composite nonconvex problems and neural networks using
several well-known datasets.Comment: 45 pages, 8 figures, and 2 tabl
Faster Stochastic Quasi-Newton Methods
Stochastic optimization methods have become a class of popular optimization
tools in machine learning. Especially, stochastic gradient descent (SGD) has
been widely used for machine learning problems such as training neural networks
due to low per-iteration computational complexity. In fact, the Newton or
quasi-newton methods leveraging second-order information are able to achieve a
better solution than the first-order methods. Thus, stochastic quasi-Newton
(SQN) methods have been developed to achieve the better solution efficiently
than the stochastic first-order methods by utilizing approximate second-order
information. However, the existing SQN methods still do not reach the best
known stochastic first-order oracle (SFO) complexity. To fill this gap, we
propose a novel faster stochastic quasi-Newton method (SpiderSQN) based on the
variance reduced technique of SIPDER. We prove that our SpiderSQN method
reaches the best known SFO complexity of
in the finite-sum setting to obtain an -first-order stationary point.
To further improve its practical performance, we incorporate SpiderSQN with
different momentum schemes. Moreover, the proposed algorithms are generalized
to the online setting, and the corresponding SFO complexity of
is developed, which also matches the existing best
result. Extensive experiments on benchmark datasets demonstrate that our new
algorithms outperform state-of-the-art approaches for nonconvex optimization.Comment: 11 pages, accepted for publication by TNNLS. arXiv admin note: text
overlap with arXiv:1902.02715 by other author
A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization
We introduce a new approach to develop stochastic optimization algorithms for
a class of stochastic composite and possibly nonconvex optimization problems.
The main idea is to combine two stochastic estimators to create a new hybrid
one. We first introduce our hybrid estimator and then investigate its
fundamental properties to form a foundational theory for algorithmic
development. Next, we apply our theory to develop several variants of
stochastic gradient methods to solve both expectation and finite-sum composite
optimization problems. Our first algorithm can be viewed as a variant of
proximal stochastic gradient methods with a single-loop, but can achieve
-oracle
complexity bound, matching the best-known ones from state-of-the-art
double-loop algorithms in the literature, where is the variance
and is a desired accuracy. Then, we consider two different
variants of our method: adaptive step-size and restarting schemes that have
similar theoretical guarantees as in our first algorithm. We also study two
mini-batch variants of the proposed methods. In all cases, we achieve the
best-known complexity bounds under standard assumptions. We test our methods on
several numerical examples with real datasets and compare them with
state-of-the-arts. Our numerical experiments show that the new methods are
comparable and, in many cases, outperform their competitors.Comment: 49 pages, 2 tables, 9 figure
Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization
Two new stochastic variance-reduced algorithms named SARAH and SPIDER have
been recently proposed, and SPIDER has been shown to achieve a near-optimal
gradient oracle complexity for nonconvex optimization. However, the theoretical
advantage of SPIDER does not lead to substantial improvement of practical
performance over SVRG. To address this issue, momentum technique can be a good
candidate to improve the performance of SPIDER. However, existing momentum
schemes used in variance-reduced algorithms are designed specifically for
convex optimization, and are not applicable to nonconvex scenarios. In this
paper, we develop novel momentum schemes with flexible coefficient settings to
accelerate SPIDER for nonconvex and nonsmooth composite optimization, and show
that the resulting algorithms achieve the near-optimal gradient oracle
complexity for achieving a generalized first-order stationary condition.
Furthermore, we generalize our algorithm to online nonconvex and nonsmooth
optimization, and establish an oracle complexity result that matches the
state-of-the-art. Our extensive experiments demonstrate the superior
performance of our proposed algorithm over other stochastic variance-reduced
algorithms.Comment: We are merging the results of this paper with another paper at
arXiv:1810.10690. Therefore, we want to withdraw this pape