2,460 research outputs found
On the convergence of mirror descent beyond stochastic convex programming
In this paper, we examine the convergence of mirror descent in a class of
stochastic optimization problems that are not necessarily convex (or even
quasi-convex), and which we call variationally coherent. Since the standard
technique of "ergodic averaging" offers no tangible benefits beyond convex
programming, we focus directly on the algorithm's last generated sample (its
"last iterate"), and we show that it converges with probabiility if the
underlying problem is coherent. We further consider a localized version of
variational coherence which ensures local convergence of stochastic mirror
descent (SMD) with high probability. These results contribute to the landscape
of non-convex stochastic optimization by showing that (quasi-)convexity is not
essential for convergence to a global minimum: rather, variational coherence, a
much weaker requirement, suffices. Finally, building on the above, we reveal an
interesting insight regarding the convergence speed of SMD: in problems with
sharp minima (such as generic linear programs or concave minimization
problems), SMD reaches a minimum point in a finite number of steps (a.s.), even
in the presence of persistent gradient noise. This result is to be contrasted
with existing black-box convergence rate estimates that are only asymptotic.Comment: 30 pages, 5 figure
On the Regularizing Property of Stochastic Gradient Descent
Stochastic gradient descent is one of the most successful approaches for
solving large-scale problems, especially in machine learning and statistics. At
each iteration, it employs an unbiased estimator of the full gradient computed
from one single randomly selected data point. Hence, it scales well with
problem size and is very attractive for truly massive dataset, and holds
significant potentials for solving large-scale inverse problems. In the recent
literature of machine learning, it was empirically observed that when equipped
with early stopping, it has regularizing property. In this work, we rigorously
establish its regularizing property (under \textit{a priori} early stopping
rule), and also prove convergence rates under the canonical sourcewise
condition, for minimizing the quadratic functional for linear inverse problems.
This is achieved by combining tools from classical regularization theory and
stochastic analysis. Further, we analyze the preasymptotic weak and strong
convergence behavior of the algorithm. The theoretical findings shed insights
into the performance of the algorithm, and are complemented with illustrative
numerical experiments.Comment: 22 pages, better presentatio
- …