9,616 research outputs found
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Stochastic gradient descent (SGD) is a simple and popular method to solve
stochastic optimization problems which arise in machine learning. For strongly
convex problems, its convergence rate was known to be O(\log(T)/T), by running
SGD for T iterations and returning the average point. However, recent results
showed that using a different algorithm, one can get an optimal O(1/T) rate.
This might lead one to believe that standard SGD is suboptimal, and maybe
should even be replaced as a method of choice. In this paper, we investigate
the optimality of SGD in a stochastic setting. We show that for smooth
problems, the algorithm attains the optimal O(1/T) rate. However, for
non-smooth problems, the convergence rate with averaging might really be
\Omega(\log(T)/T), and this is not just an artifact of the analysis. On the
flip side, we show that a simple modification of the averaging step suffices to
recover the O(1/T) rate, and no other change of the algorithm is necessary. We
also present experimental results which support our findings, and point out
open problems.Comment: Updated version which fixes a bug in the proof of lemma 1 and
modifies the step size choice. As a result, constants are changed throughout
the pape
- …