Stochastic convex optimization is a basic and well studied primitive in
machine learning. It is well known that convex and Lipschitz functions can be
minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized
Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which
updates according to the direction of the gradients, rather than the gradients
themselves. In this paper we analyze a stochastic version of NGD and prove its
convergence to a global minimum for a wider class of functions: we require the
functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens
the con- cept of unimodality to multidimensions and allows for certain types of
saddle points, which are a known hurdle for first-order optimization methods
such as gradient descent. Locally-Lipschitz functions are only required to be
Lipschitz in a small region around the optimum. This assumption circumvents
gradient explosion, which is another known hurdle for gradient descent
variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic
normalized gradient descent algorithm provably requires a minimal minibatch
size