12,804 research outputs found
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
When the linear measurements of an instance of low-rank matrix recovery
satisfy a restricted isometry property (RIP)---i.e. they are approximately
norm-preserving---the problem is known to contain no spurious local minima, so
exact recovery is guaranteed. In this paper, we show that moderate RIP is not
enough to eliminate spurious local minima, so existing results can only hold
for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that
every x is the spurious local minimum of a rank-1 instance of matrix recovery
that satisfies RIP. One specific counterexample has RIP constant ,
but causes randomly initialized stochastic gradient descent (SGD) to fail 12%
of the time. SGD is frequently able to avoid and escape spurious local minima,
but this empirical result shows that it can occasionally be defeated by their
existence. Hence, while exact recovery guarantees will likely require a proof
of no spurious local minima, arguments based solely on norm preservation will
only be applicable to a narrow set of nearly-isotropic instances.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018
Global Optimality in Low-rank Matrix Optimization
This paper considers the minimization of a general objective function
over the set of rectangular matrices that have rank at most . To
reduce the computational burden, we factorize the variable into a product
of two smaller matrices and optimize over these two matrices instead of .
Despite the resulting nonconvexity, recent studies in matrix completion and
sensing have shown that the factored problem has no spurious local minima and
obeys the so-called strict saddle property (the function has a directional
negative curvature at all critical points but local minima). We analyze the
global geometry for a general and yet well-conditioned objective function
whose restricted strong convexity and restricted strong smoothness
constants are comparable. In particular, we show that the reformulated
objective function has no spurious local minima and obeys the strict saddle
property. These geometric properties imply that a number of iterative
optimization algorithms (such as gradient descent) can provably solve the
factored problem with global convergence
On the Landscape of Synchronization Networks: A Perspective from Nonconvex Optimization
Studying the landscape of nonconvex cost function is key towards a better
understanding of optimization algorithms widely used in signal processing,
statistics, and machine learning. Meanwhile, the famous Kuramoto model has been
an important mathematical model to study the synchronization phenomena of
coupled oscillators over various network topologies. In this paper, we bring
together these two seemingly unrelated objects by investigating the
optimization landscape of a nonlinear function
associated to an underlying network and exploring the relationship between the
existence of local minima and network topology. This function arises naturally
in Burer-Monteiro method applied to synchronization as well as
matrix completion on the torus. Moreover, it corresponds to the energy function
of the homogeneous Kuramoto model on complex networks for coupled oscillators.
We prove the minimizer of the energy function is unique up to a global
translation under deterministic dense graphs and Erd\H{o}s-R\'enyi random
graphs with tools from optimization and random matrix theory. Consequently, the
stable equilibrium of the corresponding homogeneous Kuramoto model is unique
and the basin of attraction for the synchronous state of these coupled
oscillators is the whole phase space minus a set of measure zero. In addition,
our results address when the Burer-Monteiro method recovers the ground truth
exactly from highly incomplete observations in synchronization
and shed light on the robustness of nonconvex optimization algorithms against
certain types of so-called monotone adversaries. Numerical simulations are
performed to illustrate our results.Comment: 27 pages, 6 figure
Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis
This work is concerned with the non-negative rank-1 robust principal
component analysis (RPCA), where the goal is to recover the dominant
non-negative principal components of a data matrix precisely, where a number of
measurements could be grossly corrupted with sparse and arbitrary large noise.
Most of the known techniques for solving the RPCA rely on convex relaxation
methods by lifting the problem to a higher dimension, which significantly
increase the number of variables. As an alternative, the well-known
Burer-Monteiro approach can be used to cast the RPCA as a non-convex and
non-smooth optimization problem with a significantly smaller number of
variables. In this work, we show that the low-dimensional formulation of the
symmetric and asymmetric positive rank-1 RPCA based on the Burer-Monteiro
approach has benign landscape, i.e., 1) it does not have any spurious local
solution, 2) has a unique global solution, and 3) its unique global solution
coincides with the true components. An implication of this result is that
simple local search algorithms are guaranteed to achieve a zero global
optimality gap when directly applied to the low-dimensional formulation.
Furthermore, we provide strong deterministic and probabilistic guarantees for
the exact recovery of the true principal components. In particular, it is shown
that a constant fraction of the measurements could be grossly corrupted and yet
they would not create any spurious local solution
A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization
We study the set of continuous functions that admit no spurious local optima
(i.e. local minima that are not global minima) which we term \textit{global
functions}. They satisfy various powerful properties for analyzing nonconvex
and nonsmooth optimization problems. For instance, they satisfy a theorem akin
to the fundamental uniform limit theorem in the analysis regarding continuous
functions. Global functions are also endowed with useful properties regarding
the composition of functions and change of variables. Using these new results,
we show that a class of nonconvex and nonsmooth optimization problems arising
in tensor decomposition applications are global functions. This is the first
result concerning nonconvex methods for nonsmooth objective functions. Our
result provides a theoretical guarantee for the widely-used norm to
avoid outliers in nonconvex optimization.Comment: 22 pages, 13 figure
Model-free Nonconvex Matrix Completion: Local Minima Analysis and Applications in Memory-efficient Kernel PCA
This work studies low-rank approximation of a positive semidefinite matrix
from partial entries via nonconvex optimization. We characterized how well
local-minimum based low-rank factorization approximates a fixed positive
semidefinite matrix without any assumptions on the rank-matching, the condition
number or eigenspace incoherence parameter. Furthermore, under certain
assumptions on rank-matching and well-boundedness of condition numbers and
eigenspace incoherence parameters, a corollary of our main theorem improves the
state-of-the-art sampling rate results for nonconvex matrix completion with no
spurious local minima in Ge et al. [2016, 2017]. In addition, we investigated
when the proposed nonconvex optimization results in accurate low-rank
approximations even in presence of large condition numbers, large incoherence
parameters, or rank mismatching. We also propose to apply the nonconvex
optimization to memory-efficient Kernel PCA. Compared to the well-known
Nystr\"{o}m methods, numerical experiments indicate that the proposed nonconvex
optimization approach yields more stable results in both low-rank approximation
and clustering.Comment: Main theorem improve
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
In this paper we develop a new framework that captures the common landscape
underlying the common non-convex low-rank matrix problems including matrix
sensing, matrix completion and robust PCA. In particular, we show for all above
problems (including asymmetric cases): 1) all local minima are also globally
optimal; 2) no high-order saddle points exists. These results explain why
simple algorithms such as stochastic gradient descent have global converge, and
efficiently optimize these non-convex objective functions in practice. Our
framework connects and simplifies the existing analyses on optimization
landscapes for matrix sensing and symmetric matrix completion. The framework
naturally leads to new results for asymmetric matrix completion and robust PCA
An equivalence between stationary points for rank constraints versus low-rank factorizations
Two common approaches in low-rank optimization problems are either working
directly with a rank constraint on the matrix variable, or optimizing over a
low-rank factorization so that the rank constraint is implicitly ensured. In
this paper, we study the natural connection between the rank-constrained and
factorized approaches. We show that all second-order stationary points of the
factorized objective function correspond to stationary points of projected
gradient descent run on the original problem (where the projection step
enforces the rank constraint). This result allows us to unify many existing
optimization guarantees that have been proved specifically in either the
rank-constrained or the factorized setting, and leads to new results for
certain settings of the problem
Spurious Local Minima are Common in Two-Layer ReLU Neural Networks
We consider the optimization problem associated with training simple ReLU
neural networks of the form with respect to the
squared loss. We provide a computer-assisted proof that even if the input
distribution is standard Gaussian, even if the dimension is arbitrarily large,
and even if the target values are generated by such a network, with orthonormal
parameter vectors, the problem can still have spurious local minima once . By a concentration of measure argument, this implies that in high
input dimensions, \emph{nearly all} target networks of the relevant sizes lead
to spurious local minima. Moreover, we conduct experiments which show that the
probability of hitting such local minima is quite high, and increasing with the
network size. On the positive side, mild over-parameterization appears to
drastically reduce such local minima, indicating that an over-parameterization
assumption is necessary to get a positive result in this setting
- …