2,903 research outputs found
A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization
We study the set of continuous functions that admit no spurious local optima
(i.e. local minima that are not global minima) which we term \textit{global
functions}. They satisfy various powerful properties for analyzing nonconvex
and nonsmooth optimization problems. For instance, they satisfy a theorem akin
to the fundamental uniform limit theorem in the analysis regarding continuous
functions. Global functions are also endowed with useful properties regarding
the composition of functions and change of variables. Using these new results,
we show that a class of nonconvex and nonsmooth optimization problems arising
in tensor decomposition applications are global functions. This is the first
result concerning nonconvex methods for nonsmooth objective functions. Our
result provides a theoretical guarantee for the widely-used norm to
avoid outliers in nonconvex optimization.Comment: 22 pages, 13 figure
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
When the linear measurements of an instance of low-rank matrix recovery
satisfy a restricted isometry property (RIP)---i.e. they are approximately
norm-preserving---the problem is known to contain no spurious local minima, so
exact recovery is guaranteed. In this paper, we show that moderate RIP is not
enough to eliminate spurious local minima, so existing results can only hold
for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that
every x is the spurious local minimum of a rank-1 instance of matrix recovery
that satisfies RIP. One specific counterexample has RIP constant ,
but causes randomly initialized stochastic gradient descent (SGD) to fail 12%
of the time. SGD is frequently able to avoid and escape spurious local minima,
but this empirical result shows that it can occasionally be defeated by their
existence. Hence, while exact recovery guarantees will likely require a proof
of no spurious local minima, arguments based solely on norm preservation will
only be applicable to a narrow set of nearly-isotropic instances.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018
Global Optimality in Distributed Low-rank Matrix Factorization
We study the convergence of a variant of distributed gradient descent (DGD)
on a distributed low-rank matrix approximation problem wherein some
optimization variables are used for consensus (as in classical DGD) and some
optimization variables appear only locally at a single node in the network. We
term the resulting algorithm DGD+LOCAL. Using algorithmic connections to
gradient descent and geometric connections to the well-behaved landscape of the
centralized low-rank matrix approximation problem, we identify sufficient
conditions where DGD+LOCAL is guaranteed to converge with exact consensus to a
global minimizer of the original centralized problem. For the distributed
low-rank matrix approximation problem, these guarantees are stronger---in terms
of consensus and optimality---than what appear in the literature for classical
DGD and more general problems
Efficiently testing local optimality and escaping saddles for ReLU networks
We provide a theoretical algorithm for checking local optimality and escaping
saddles at nondifferentiable points of empirical risks of two-layer ReLU
networks. Our algorithm receives any parameter value and returns: local
minimum, second-order stationary point, or a strict descent direction. The
presence of data points on the nondifferentiability of the ReLU divides the
parameter space into at most regions, which makes analysis difficult. By
exploiting polyhedral geometry, we reduce the total computation down to one
convex quadratic program (QP) for each hidden node, (in)equality tests,
and one (or a few) nonconvex QP. For the last QP, we show that our specific
problem can be solved efficiently, in spite of nonconvexity. In the benign
case, we solve one equality constrained QP, and we prove that projected
gradient descent solves it exponentially fast. In the bad case, we have to
solve a few more inequality constrained QPs, but we prove that the time
complexity is exponential only in the number of inequality constraints. Our
experiments show that either benign case or bad case with very few inequality
constraints occurs, implying that our algorithm is efficient in most cases.Comment: 23 pages, appeared at ICLR 201
Deterministic control of randomly-terminated processes
We consider both discrete and continuous "uncertain horizon" deterministic
control processes, for which the termination time is a random variable. We
examine the dynamic programming equations for the value function of such
processes, explore their connections to infinite-horizon and optimal-stopping
problems, and derive sufficient conditions for the applicability of
non-iterative (label-setting) methods. In the continuous case, the resulting
PDE has a free boundary, on which all characteristic curves originate. The
causal properties of "uncertain horizon" problems can be exploited to design
efficient numerical algorithms: we derive causal semi-Lagrangian and Eulerian
discretizations for the isotropic randomly-terminated problems, and use them to
build a modified version of the Fast Marching Method. We illustrate our
approach using numerical examples from optimal idle-time processing and
expected response-time minimization.Comment: 35 pages; 8 figures. Accepted for publication in "Interfaces and Free
Boundaries
Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization
We consider the problem of finding an approximate second-order stationary
point of a constrained non-convex optimization problem. We first show that,
unlike the gradient descent method for unconstrained optimization, the vanilla
projected gradient descent algorithm may converge to a strict saddle point even
when there is only a single linear constraint. We then provide a hardness
result by showing that checking -second order
stationarity is NP-hard even in the presence of linear constraints. Despite our
hardness result, we identify instances of the problem for which checking second
order stationarity can be done efficiently. For such instances, we propose a
dynamic second order Frank--Wolfe algorithm which converges to ()-second order stationary points in
iterations. The
proposed algorithm can be used in general constrained non-convex optimization
as long as the constrained quadratic sub-problem can be solved efficiently
Certifiably Globally Optimal Extrinsic Calibration from Per-Sensor Egomotion
We present a certifiably globally optimal algorithm for determining the
extrinsic calibration between two sensors that are capable of producing
independent egomotion estimates. This problem has been previously solved using
a variety of techniques, including local optimization approaches that have no
formal global optimality guarantees. We use a quadratic objective function to
formulate calibration as a quadratically constrained quadratic program (QCQP).
By leveraging recent advances in the optimization of QCQPs, we are able to use
existing semidefinite program (SDP) solvers to obtain a certifiably global
optimum via the Lagrangian dual problem. Our problem formulation can be
globally optimized by existing general-purpose solvers in less than a second,
regardless of the number of measurements available and the noise level. This
enables a variety of robotic platforms to rapidly and robustly compute and
certify a globally optimal set of calibration parameters without a prior
estimate or operator intervention. We compare the performance of our approach
with a local solver on extensive simulations and multiple real datasets.
Finally, we present necessary observability conditions that connect our
approach to recent theoretical results and analytically support the empirical
performance of our system.Comment: 8 pages, 8 figure
On the loss landscape of a class of deep neural networks with no bad local valleys
We identify a class of over-parameterized deep neural networks with standard
activation functions and cross-entropy loss which provably have no bad local
valley, in the sense that from any point in parameter space there exists a
continuous path on which the cross-entropy loss is non-increasing and gets
arbitrarily close to zero. This implies that these networks have no sub-optimal
strict local minima.Comment: Accepted at ICLR 201
Optimality Conditions for Nonlinear Semidefinite Programming via Squared Slack Variables
In this work, we derive second-order optimality conditions for nonlinear
semidefinite programming (NSDP) problems, by reformulating it as an ordinary
nonlinear programming problem using squared slack variables. We first consider
the correspondence between Karush-Kuhn-Tucker points and regularity conditions
for the general NSDP and its reformulation via slack variables. Then, we obtain
a pair of "no-gap" second-order optimality conditions that are essentially
equivalent to the ones already considered in the literature. We conclude with
the analysis of some computational prospects of the squared slack variables
approach for NSDP.Comment: 20 pages, 3 figure
Constrained optimization through fixed point techniques
We introduce an alternative approach for constrained mathematical programming
problems. It rests on two main aspects: an efficient way to compute optimal
solutions for unconstrained problems, and multipliers regarded as variables for
a certain map. Contrary to typical dual strategies, optimal vectors of
multipliers are sought as fixed points for that map. Two distinctive features
are worth highlighting: its simplicity and flexibility for the implementation,
and its convergence properties.Comment: 14 pages, 2 figure
- …