Search CORE

3,931 research outputs found

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Author: Panageas Ioannis
Piliouras Georgios
Publication venue
Publication date: 07/06/2016
Field of study

Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.Comment: 2 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Theory of Deep Learning IIb: Optimization Properties of SGD

Author: Golowich Noah
Liao Qianli
Miranda Brando
Poggio Tomaso
Rakhlin Alexander
Zhang Chiyuan
Publication venue
Publication date: 27/12/2017
Field of study

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume, "flat" minima, selecting flat minimizers which are with very high probability also global minimizer

arXiv.org e-Print Archive

DSpace@MIT

Regularized Jacobi iteration for decentralized convex optimization with separable constraints

Author: Deori Luca
Margellos Kostas
Prandini Maria
Publication venue
Publication date: 04/04/2017
Field of study

We consider multi-agent, convex optimization programs subject to separable constraints, where the constraint function of each agent involves only its local decision vector, while the decision vectors of all agents are coupled via a common objective function. We focus on a regularized variant of the so called Jacobi algorithm for decentralized computation in such problems. We first consider the case where the objective function is quadratic, and provide a fixed-point theoretic analysis showing that the algorithm converges to a minimizer of the centralized problem. Moreover, we quantify the potential benefits of such an iterative scheme by comparing it against a scaled projected gradient algorithm. We then consider the general case and show that all limit points of the proposed iteration are optimal solutions of the centralized problem. The efficacy of the proposed algorithm is illustrated by applying it to the problem of optimal charging of electric vehicles, where, as opposed to earlier approaches, we show convergence to an optimal charging scheme for a finite, possibly large, number of vehicles

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Oxford University Research Archive