236 research outputs found
Optimal Black-Box Reductions Between Optimization Objectives
The diverse world of machine learning applications has given rise to a
plethora of algorithms and optimization methods, finely tuned to the specific
regression or classification task at hand. We reduce the complexity of
algorithm design for machine learning by reductions: we develop reductions that
take a method developed for one setting and apply it to the entire spectrum of
smoothness and strong-convexity in applications.
Furthermore, unlike existing results, our new reductions are OPTIMAL and more
PRACTICAL. We show how these new reductions give rise to new and faster running
times on training linear classifiers for various families of loss functions,
and conclude with experiments showing their successes also in practice.Comment: new applications of our optimal reductions are obtained in this
version
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization
In this paper, we consider the problem of minimizing the average of a large
number of nonsmooth and convex functions. Such problems often arise in typical
machine learning problems as empirical risk minimization, but are
computationally very challenging. We develop and analyze a new algorithm that
achieves robust linear convergence rate, and both its time complexity and
gradient complexity are superior than state-of-art nonsmooth algorithms and
subgradient-based schemes. Besides, our algorithm works without any extra error
bound conditions on the objective function as well as the common
strongly-convex condition. We show that our algorithm has wide applications in
optimization and machine learning problems, and demonstrate experimentally that
it performs well on a large-scale ranking problem.Comment: 10 pages, 12 figures. arXiv admin note: text overlap with
arXiv:1103.4296, arXiv:1403.4699 by other author
Variance Reduction for Faster Non-Convex Optimization
We consider the fundamental problem in non-convex optimization of efficiently
reaching a stationary point. In contrast to the convex case, in the long
history of this basic problem, the only known theoretical results on
first-order non-convex optimization remain to be full gradient descent that
converges in iterations for smooth objectives, and
stochastic gradient descent that converges in iterations
for objectives that are sum of smooth functions.
We provide the first improvement in this line of research. Our result is
based on the variance reduction trick recently introduced to convex
optimization, as well as a brand new analysis of variance reduction that is
suitable for non-convex optimization. For objectives that are sum of smooth
functions, our first-order minibatch stochastic method converges with an
rate, and is faster than full gradient descent by
.
We demonstrate the effectiveness of our methods on empirical risk
minimizations with non-convex loss functions and training neural nets.Comment: polished writin
Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization
We study the conditions under which one is able to efficiently apply
variance-reduction and acceleration schemes on finite sum optimization
problems. First, we show that, perhaps surprisingly, the finite sum structure
by itself, is not sufficient for obtaining a complexity bound of
\tilde{\cO}((n+L/\mu)\ln(1/\epsilon)) for -smooth and -strongly
convex individual functions - one must also know which individual function is
being referred to by the oracle at each iteration. Next, we show that for a
broad class of first-order and coordinate-descent finite sum algorithms
(including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated'
complexity bound of \tilde{\cO}((n+\sqrt{n L/\mu})\ln(1/\epsilon)), unless
the strong convexity parameter is given explicitly. Lastly, we show that when
this class of algorithms is used for minimizing -smooth and convex finite
sums, the optimal complexity bound is \tilde{\cO}(n+L/\epsilon), assuming
that (on average) the same update rule is used in every iteration, and
\tilde{\cO}(n+\sqrt{nL/\epsilon}), otherwise
On the Adaptivity of Stochastic Gradient-Based Optimization
Stochastic-gradient-based optimization has been a core enabling methodology
in applications to large-scale problems in machine learning and related areas.
Despite the progress, the gap between theory and practice remains significant,
with theoreticians pursuing mathematical optimality at a cost of obtaining
specialized procedures in different regimes (e.g., modulus of strong convexity,
magnitude of target accuracy, signal-to-noise ratio), and with practitioners
not readily able to know which regime is appropriate to their problem, and
seeking broadly applicable algorithms that are reasonably close to optimality.
To bridge these perspectives it is necessary to study algorithms that are
adaptive to different regimes. We present the stochastically controlled
stochastic gradient (SCSG) method for composite convex finite-sum optimization
problems and show that SCSG is adaptive to both strong convexity and target
accuracy. The adaptivity is achieved by batch variance reduction with adaptive
batch sizes and a novel technique, which we referred to as geometrization,
which sets the length of each epoch as a geometric random variable. The
algorithm achieves strictly better theoretical complexity than other existing
adaptive algorithms, while the tuning parameters of the algorithm only depend
on the smoothness parameter of the objective.Comment: Accepted by SIAM Journal on Optimization; 54 page
Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations
We present novel minibatch stochastic optimization methods for empirical risk
minimization problems, the methods efficiently leverage variance reduced
first-order and sub-sampled higher-order information to accelerate the
convergence speed. For quadratic objectives, we prove improved iteration
complexity over state-of-the-art under reasonable assumptions. We also provide
empirical evidence of the advantages of our method compared to existing
approaches in the literature
On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
The application of stochastic variance reduction to optimization has shown
remarkable recent theoretical and practical success. The applicability of these
techniques to the hard non-convex optimization problems encountered during
training of modern deep neural networks is an open problem. We show that naive
application of the SVRG technique and related approaches fail, and explore why
Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning
Recently, research on accelerated stochastic gradient descent methods (e.g.,
SVRG) has made exciting progress (e.g., linear convergence for strongly convex
problems). However, the best-known methods (e.g., Katyusha) requires at least
two auxiliary variables and two momentum parameters. In this paper, we propose
a fast stochastic variance reduction gradient (FSVRG) method, in which we
design a novel update rule with the Nesterov's momentum and incorporate the
technique of growing epoch size. FSVRG has only one auxiliary variable and one
momentum weight, and thus it is much simpler and has much lower per-iteration
complexity. We prove that FSVRG achieves linear convergence for strongly convex
problems and the optimal convergence rate for non-strongly
convex problems, where is the number of outer-iterations. We also extend
FSVRG to directly solve the problems with non-smooth component functions, such
as SVM. Finally, we empirically study the performance of FSVRG for solving
various machine learning problems such as logistic regression, ridge
regression, Lasso and SVM. Our results show that FSVRG outperforms the
state-of-the-art stochastic methods, including Katyusha.Comment: Corrected a few typos in this versio
A unified variance-reduced accelerated gradient method for convex optimization
We propose a novel randomized incremental gradient algorithm, namely,
VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization.
Equipped with a unified step-size policy that adjusts itself to the value of
the condition number, Varag exhibits the unified optimal rates of convergence
for solving smooth convex finite-sum problems directly regardless of their
strong convexity. Moreover, Varag is the first accelerated randomized
incremental gradient method that benefits from the strong convexity of the
data-fidelity term to achieve the optimal linear convergence. It also
establishes an optimal linear rate of convergence for solving a wide class of
problems only satisfying a certain error bound condition rather than strong
convexity. Varag can also be extended to solve stochastic finite-sum problems.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019
- …