8,531 research outputs found
The importance of better models in stochastic optimization
Standard stochastic optimization methods are brittle, sensitive to stepsize
choices and other algorithmic parameters, and they exhibit instability outside
of well-behaved families of objectives. To address these challenges, we
investigate models for stochastic minimization and learning problems that
exhibit better robustness to problem families and algorithmic parameters. With
appropriately accurate models---which we call the aProx family---stochastic
methods can be made stable, provably convergent and asymptotically optimal;
even modeling that the objective is nonnegative is sufficient for this
stability. We extend these results beyond convexity to weakly convex
objectives, which include compositions of convex losses with smooth functions
common in modern machine learning applications. We highlight the importance of
robustness and accurate modeling with a careful experimental evaluation of
convergence time and algorithm sensitivity
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
- …