990 research outputs found
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Beyond Convexity: Stochastic Quasi-Convex Optimization
Stochastic convex optimization is a basic and well studied primitive in
machine learning. It is well known that convex and Lipschitz functions can be
minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized
Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which
updates according to the direction of the gradients, rather than the gradients
themselves. In this paper we analyze a stochastic version of NGD and prove its
convergence to a global minimum for a wider class of functions: we require the
functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens
the con- cept of unimodality to multidimensions and allows for certain types of
saddle points, which are a known hurdle for first-order optimization methods
such as gradient descent. Locally-Lipschitz functions are only required to be
Lipschitz in a small region around the optimum. This assumption circumvents
gradient explosion, which is another known hurdle for gradient descent
variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic
normalized gradient descent algorithm provably requires a minimal minibatch
size
A continuous-time analysis of distributed stochastic gradient
We analyze the effect of synchronization on distributed stochastic gradient
algorithms. By exploiting an analogy with dynamical models of biological quorum
sensing -- where synchronization between agents is induced through
communication with a common signal -- we quantify how synchronization can
significantly reduce the magnitude of the noise felt by the individual
distributed agents and by their spatial mean. This noise reduction is in turn
associated with a reduction in the smoothing of the loss function imposed by
the stochastic gradient approximation. Through simulations on model non-convex
objectives, we demonstrate that coupling can stabilize higher noise levels and
improve convergence. We provide a convergence analysis for strongly convex
functions by deriving a bound on the expected deviation of the spatial mean of
the agents from the global minimizer for an algorithm based on quorum sensing,
the same algorithm with momentum, and the Elastic Averaging SGD (EASGD)
algorithm. We discuss extensions to new algorithms which allow each agent to
broadcast its current measure of success and shape the collective computation
accordingly. We supplement our theoretical analysis with numerical experiments
on convolutional neural networks trained on the CIFAR-10 dataset, where we note
a surprising regularizing property of EASGD even when applied to the
non-distributed case. This observation suggests alternative second-order
in-time algorithms for non-distributed optimization that are competitive with
momentum methods.Comment: 9/14/19 : Final version, accepted for publication in Neural
Computation. 4/7/19 : Significant edits: addition of simulations, deep
network results, and revisions throughout. 12/28/18: Initial submissio
- …