Search CORE

2,903 research outputs found

A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization

Author: Josz Cedric
Lavaei Javad
Ouyang Yi
Sojoudi Somayeh
Zhang Richard Y.
Publication venue
Publication date: 31/10/2018
Field of study

We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of nonconvex and nonsmooth optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used

\ell_1

norm to avoid outliers in nonconvex optimization.Comment: 22 pages, 13 figure

arXiv.org e-Print Archive

How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?

Author: Josz Cédric
Lavaei Javad
Sojoudi Somayeh
Zhang Richard Y.
Publication venue
Publication date: 30/10/2018
Field of study

When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant

\delta=1/2

, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018

arXiv.org e-Print Archive

Global Optimality in Distributed Low-rank Matrix Factorization

Author: Li Qiuwei
Tang Gongguo
Wakin Michael B.
Yang Xinshuo
Zhu Zhihui
Publication venue
Publication date: 24/12/2018
Field of study

We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network. We term the resulting algorithm DGD+LOCAL. Using algorithmic connections to gradient descent and geometric connections to the well-behaved landscape of the centralized low-rank matrix approximation problem, we identify sufficient conditions where DGD+LOCAL is guaranteed to converge with exact consensus to a global minimizer of the original centralized problem. For the distributed low-rank matrix approximation problem, these guarantees are stronger---in terms of consensus and optimality---than what appear in the literature for classical DGD and more general problems

arXiv.org e-Print Archive

Efficiently testing local optimality and escaping saddles for ReLU networks

Author: Jadbabaie Ali
Sra Suvrit
Yun Chulhee
Publication venue
Publication date: 28/05/2019
Field of study

We provide a theoretical algorithm for checking local optimality and escaping saddles at nondifferentiable points of empirical risks of two-layer ReLU networks. Our algorithm receives any parameter value and returns: local minimum, second-order stationary point, or a strict descent direction. The presence of

M

data points on the nondifferentiability of the ReLU divides the parameter space into at most

2^M

regions, which makes analysis difficult. By exploiting polyhedral geometry, we reduce the total computation down to one convex quadratic program (QP) for each hidden node,

O(M)

(in)equality tests, and one (or a few) nonconvex QP. For the last QP, we show that our specific problem can be solved efficiently, in spite of nonconvexity. In the benign case, we solve one equality constrained QP, and we prove that projected gradient descent solves it exponentially fast. In the bad case, we have to solve a few more inequality constrained QPs, but we prove that the time complexity is exponential only in the number of inequality constraints. Our experiments show that either benign case or bad case with very few inequality constraints occurs, implying that our algorithm is efficient in most cases.Comment: 23 pages, appeared at ICLR 201

arXiv.org e-Print Archive

Deterministic control of randomly-terminated processes

Author: Andrews June
Vladimirsky Alexander
Publication venue: 'European Mathematical Society Publishing House'
Publication date: 27/10/2013
Field of study

We consider both discrete and continuous "uncertain horizon" deterministic control processes, for which the termination time is a random variable. We examine the dynamic programming equations for the value function of such processes, explore their connections to infinite-horizon and optimal-stopping problems, and derive sufficient conditions for the applicability of non-iterative (label-setting) methods. In the continuous case, the resulting PDE has a free boundary, on which all characteristic curves originate. The causal properties of "uncertain horizon" problems can be exploited to design efficient numerical algorithms: we derive causal semi-Lagrangian and Eulerian discretizations for the isotropic randomly-terminated problems, and use them to build a modified version of the Fast Marching Method. We illustrate our approach using numerical examples from optimal idle-time processing and expected response-time minimization.Comment: 35 pages; 8 figures. Accepted for publication in "Interfaces and Free Boundaries

arXiv.org e-Print Archive

Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization

Author: Lee Jason D.
Nouiehed Maher
Razaviyayn Meisam
Publication venue
Publication date: 02/06/2020
Field of study

We consider the problem of finding an approximate second-order stationary point of a constrained non-convex optimization problem. We first show that, unlike the gradient descent method for unconstrained optimization, the vanilla projected gradient descent algorithm may converge to a strict saddle point even when there is only a single linear constraint. We then provide a hardness result by showing that checking

(\epsilon_g,\epsilon_H)

-second order stationarity is NP-hard even in the presence of linear constraints. Despite our hardness result, we identify instances of the problem for which checking second order stationarity can be done efficiently. For such instances, we propose a dynamic second order Frank--Wolfe algorithm which converges to (

\epsilon_g, \epsilon_H

)-second order stationary points in

{\mathcal{O}}(\max\{\epsilon_g^{-2}, \epsilon_H^{-3}\})

iterations. The proposed algorithm can be used in general constrained non-convex optimization as long as the constrained quadratic sub-problem can be solved efficiently

arXiv.org e-Print Archive

Certifiably Globally Optimal Extrinsic Calibration from Per-Sensor Egomotion

Author: Giamou Matthew
Kelly Jonathan
Ma Ziye
Peretroukhin Valentin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2020
Field of study

We present a certifiably globally optimal algorithm for determining the extrinsic calibration between two sensors that are capable of producing independent egomotion estimates. This problem has been previously solved using a variety of techniques, including local optimization approaches that have no formal global optimality guarantees. We use a quadratic objective function to formulate calibration as a quadratically constrained quadratic program (QCQP). By leveraging recent advances in the optimization of QCQPs, we are able to use existing semidefinite program (SDP) solvers to obtain a certifiably global optimum via the Lagrangian dual problem. Our problem formulation can be globally optimized by existing general-purpose solvers in less than a second, regardless of the number of measurements available and the noise level. This enables a variety of robotic platforms to rapidly and robustly compute and certify a globally optimal set of calibration parameters without a prior estimate or operator intervention. We compare the performance of our approach with a local solver on extensive simulations and multiple real datasets. Finally, we present necessary observability conditions that connect our approach to recent theoretical results and analytically support the empirical performance of our system.Comment: 8 pages, 8 figure

arXiv.org e-Print Archive

On the loss landscape of a class of deep neural networks with no bad local valleys

Author: Hein Matthias
Mukkamala Mahesh Chandra
Nguyen Quynh
Publication venue
Publication date: 23/12/2018
Field of study

We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero. This implies that these networks have no sub-optimal strict local minima.Comment: Accepted at ICLR 201

arXiv.org e-Print Archive

Optimality Conditions for Nonlinear Semidefinite Programming via Squared Slack Variables

Author: Fukuda Ellen H.
Fukushima Masao
Lourenço Bruno F.
Publication venue
Publication date: 17/12/2015
Field of study

In this work, we derive second-order optimality conditions for nonlinear semidefinite programming (NSDP) problems, by reformulating it as an ordinary nonlinear programming problem using squared slack variables. We first consider the correspondence between Karush-Kuhn-Tucker points and regularity conditions for the general NSDP and its reformulation via slack variables. Then, we obtain a pair of "no-gap" second-order optimality conditions that are essentially equivalent to the ones already considered in the literature. We conclude with the analysis of some computational prospects of the squared slack variables approach for NSDP.Comment: 20 pages, 3 figure

arXiv.org e-Print Archive

Constrained optimization through fixed point techniques

Author: Pedregal Pablo
Publication venue
Publication date: 26/10/2015
Field of study

We introduce an alternative approach for constrained mathematical programming problems. It rests on two main aspects: an efficient way to compute optimal solutions for unconstrained problems, and multipliers regarded as variables for a certain map. Contrary to typical dual strategies, optimal vectors of multipliers are sought as fixed points for that map. Two distinctive features are worth highlighting: its simplicity and flexibility for the implementation, and its convergence properties.Comment: 14 pages, 2 figure

arXiv.org e-Print Archive