17 research outputs found
On the Global Convergence of Continuous-Time Stochastic Heavy-Ball Method for Nonconvex Optimization
We study the convergence behavior of the stochastic heavy-ball method with a
small stepsize. Under a change of time scale, we approximate the discrete
method by a stochastic differential equation that models small random
perturbations of a coupled system of nonlinear oscillators. We rigorously show
that the perturbed system converges to a local minimum in a logarithmic time.
This indicates that for the diffusion process that approximates the stochastic
heavy-ball method, it takes (up to a logarithmic factor) only a linear time of
the square root of the inverse stepsize to escape from all saddle points. This
results may suggest a fast convergence of its discrete-time counterpart. Our
theoretical results are validated by numerical experiments.Comment: accepted at IEEE International Conference on Big Data in 201
On the fast convergence of random perturbations of the gradient flow
We consider in this work small random perturbations (of multiplicative noise
type) of the gradient flow. We prove that under mild conditions, when the
potential function is a Morse function with additional strong saddle condition,
the perturbed gradient flow converges to the neighborhood of local minimizers
in time on the average, where is the
scale of the random perturbation. Under a change of time scale, this indicates
that for the diffusion process that approximates the stochastic gradient
method, it takes (up to logarithmic factor) only a linear time of inverse
stepsize to evade from all saddle points. This can be regarded as a
manifestation of fast convergence of the discrete-time stochastic gradient
method, the latter being used heavily in modern statistical machine learning.Comment: Revise and Resubmit at Asymptotic Analysi
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study
the dynamics of Oja's iteration which is an online stochastic gradient descent
method for the principal component analysis. Oja's iteration maintains a
running estimate of the true principal component from streaming data and enjoys
less temporal and spatial complexities. We show that the Oja's iteration for
the top eigenvector generates a continuous-state discrete-time Markov chain
over the unit sphere. We characterize the Oja's iteration in three phases using
diffusion approximation and weak convergence tools. Our three-phase analysis
further provides a finite-sample error bound for the running estimate, which
matches the minimax information lower bound for principal component analysis
under the additional assumption of bounded samples.Comment: Appeared in NIPS 201