Search CORE

2,355 research outputs found

On the convergence of mirror descent beyond stochastic convex programming

Author: Zhou Zhengyuan
Mertikopoulos Panayotis
Bambos Nicholas
Boyd Stephen
Glynn Peter
Publication venue
Publication date: 01/01/1962
Field of study

In this paper, we examine the convergence of mirror descent in a class of stochastic optimization problems that are not necessarily convex (or even quasi-convex), and which we call variationally coherent. Since the standard technique of "ergodic averaging" offers no tangible benefits beyond convex programming, we focus directly on the algorithm's last generated sample (its "last iterate"), and we show that it converges with probabiility

1

if the underlying problem is coherent. We further consider a localized version of variational coherence which ensures local convergence of stochastic mirror descent (SMD) with high probability. These results contribute to the landscape of non-convex stochastic optimization by showing that (quasi-)convexity is not essential for convergence to a global minimum: rather, variational coherence, a much weaker requirement, suffices. Finally, building on the above, we reveal an interesting insight regarding the convergence speed of SMD: in problems with sharp minima (such as generic linear programs or concave minimization problems), SMD reaches a minimum point in a finite number of steps (a.s.), even in the presence of persistent gradient noise. This result is to be contrasted with existing black-box convergence rate estimates that are only asymptotic.Comment: 30 pages, 5 figure

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Communication Efficient Distributed Optimization using an Approximate Newton-type Method

Author: Shamir Ohad
Srebro Nathan
Zhang Tong
Publication venue
Publication date: 13/05/2014
Field of study

We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which provably \emph{improves} with the data size, requiring an essentially constant number of iterations under reasonable assumptions. We provide theoretical and empirical evidence of the advantages of our method compared to other approaches, such as one-shot parameter averaging and ADMM

arXiv.org e-Print Archive

CiteSeerX