3,931 research outputs found
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
Given a non-convex twice differentiable cost function f, we prove that the
set of initial conditions so that gradient descent converges to saddle points
where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue)
measure zero, even for cost functions f with non-isolated critical points,
answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016].
Moreover, this result extends to forward-invariant convex subspaces, allowing
for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce
an upper bound on the allowable step-size.Comment: 2 figure
Theory of Deep Learning IIb: Optimization Properties of SGD
In Theory IIb we characterize with a mix of theory and experiments the
optimization of deep convolutional networks by Stochastic Gradient Descent. The
main new result in this paper is theoretical and experimental evidence for the
following conjecture about SGD: SGD concentrates in probability -- like the
classical Langevin equation -- on large volume, "flat" minima, selecting flat
minimizers which are with very high probability also global minimizer
Regularized Jacobi iteration for decentralized convex optimization with separable constraints
We consider multi-agent, convex optimization programs subject to separable
constraints, where the constraint function of each agent involves only its
local decision vector, while the decision vectors of all agents are coupled via
a common objective function. We focus on a regularized variant of the so called
Jacobi algorithm for decentralized computation in such problems. We first
consider the case where the objective function is quadratic, and provide a
fixed-point theoretic analysis showing that the algorithm converges to a
minimizer of the centralized problem. Moreover, we quantify the potential
benefits of such an iterative scheme by comparing it against a scaled projected
gradient algorithm. We then consider the general case and show that all limit
points of the proposed iteration are optimal solutions of the centralized
problem. The efficacy of the proposed algorithm is illustrated by applying it
to the problem of optimal charging of electric vehicles, where, as opposed to
earlier approaches, we show convergence to an optimal charging scheme for a
finite, possibly large, number of vehicles
- …