1,118 research outputs found
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
Scalable Robust Matrix Factorization with Nonconvex Loss
Robust matrix factorization (RMF), which uses the -loss, often
outperforms standard matrix factorization using the -loss, particularly
when outliers are present. The state-of-the-art RMF solver is the RMF-MM
algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes
even the (convex) -loss is not robust enough. In this paper, we propose
the use of nonconvex loss to enhance robustness. To address the resultant
difficult optimization problem, we use majorization-minimization (MM)
optimization and propose a new MM surrogate. To improve scalability, we exploit
data sparsity and optimize the surrogate via its dual with the accelerated
proximal gradient algorithm. The resultant algorithm has low time and space
complexities and is guaranteed to converge to a critical point. Extensive
experiments demonstrate its superiority over the state-of-the-art in terms of
both accuracy and scalability
Exploiting the structure effectively and efficiently in low rank matrix recovery
Low rank model arises from a wide range of applications, including machine
learning, signal processing, computer algebra, computer vision, and imaging
science. Low rank matrix recovery is about reconstructing a low rank matrix
from incomplete measurements. In this survey we review recent developments on
low rank matrix recovery, focusing on three typical scenarios: matrix sensing,
matrix completion and phase retrieval. An overview of effective and efficient
approaches for the problem is given, including nuclear norm minimization,
projected gradient descent based on matrix factorization, and Riemannian
optimization based on the embedded manifold of low rank matrices. Numerical
recipes of different approaches are emphasized while accompanied by the
corresponding theoretical recovery guarantees
Global Optimality in Low-rank Matrix Optimization
This paper considers the minimization of a general objective function
over the set of rectangular matrices that have rank at most . To
reduce the computational burden, we factorize the variable into a product
of two smaller matrices and optimize over these two matrices instead of .
Despite the resulting nonconvexity, recent studies in matrix completion and
sensing have shown that the factored problem has no spurious local minima and
obeys the so-called strict saddle property (the function has a directional
negative curvature at all critical points but local minima). We analyze the
global geometry for a general and yet well-conditioned objective function
whose restricted strong convexity and restricted strong smoothness
constants are comparable. In particular, we show that the reformulated
objective function has no spurious local minima and obeys the strict saddle
property. These geometric properties imply that a number of iterative
optimization algorithms (such as gradient descent) can provably solve the
factored problem with global convergence
Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation
Low-rank modeling plays a pivotal role in signal processing and machine
learning, with applications ranging from collaborative filtering, video
surveillance, medical imaging, to dimensionality reduction and adaptive
filtering. Many modern high-dimensional data and interactions thereof can be
modeled as lying approximately in a low-dimensional subspace or manifold,
possibly with additional structures, and its proper exploitations lead to
significant reduction of costs in sensing, computation and storage. In recent
years, there is a plethora of progress in understanding how to exploit low-rank
structures using computationally efficient procedures in a provable manner,
including both convex and nonconvex approaches. On one side, convex relaxations
such as nuclear norm minimization often lead to statistically optimal
procedures for estimating low-rank matrices, where first-order methods are
developed to address the computational challenges; on the other side, there is
emerging evidence that properly designed nonconvex procedures, such as
projected gradient descent, often provide globally optimal solutions with a
much lower computational cost in many problems. This survey article will
provide a unified overview of these recent advances on low-rank matrix
estimation from incomplete measurements. Attention is paid to rigorous
characterization of the performance of these algorithms, and to problems where
the low-rank matrix have additional structural properties that require new
algorithmic designs and theoretical analysis.Comment: To appear in IEEE Signal Processing Magazin
Model-free Nonconvex Matrix Completion: Local Minima Analysis and Applications in Memory-efficient Kernel PCA
This work studies low-rank approximation of a positive semidefinite matrix
from partial entries via nonconvex optimization. We characterized how well
local-minimum based low-rank factorization approximates a fixed positive
semidefinite matrix without any assumptions on the rank-matching, the condition
number or eigenspace incoherence parameter. Furthermore, under certain
assumptions on rank-matching and well-boundedness of condition numbers and
eigenspace incoherence parameters, a corollary of our main theorem improves the
state-of-the-art sampling rate results for nonconvex matrix completion with no
spurious local minima in Ge et al. [2016, 2017]. In addition, we investigated
when the proposed nonconvex optimization results in accurate low-rank
approximations even in presence of large condition numbers, large incoherence
parameters, or rank mismatching. We also propose to apply the nonconvex
optimization to memory-efficient Kernel PCA. Compared to the well-known
Nystr\"{o}m methods, numerical experiments indicate that the proposed nonconvex
optimization approach yields more stable results in both low-rank approximation
and clustering.Comment: Main theorem improve
The Global Optimization Geometry of Low-Rank Matrix Optimization
This paper considers general rank-constrained optimization problems that
minimize a general objective function over the set of rectangular
matrices that have rank at most . To tackle the rank constraint
and also to reduce the computational burden, we factorize into where
and are and matrices, respectively, and then
optimize over the small matrices and . We characterize the global
optimization geometry of the nonconvex factored problem and show that the
corresponding objective function satisfies the robust strict saddle property as
long as the original objective function satisfies restricted strong
convexity and smoothness properties, ensuring global convergence of many local
search algorithms (such as noisy gradient descent) in polynomial time for
solving the factored problem. We also provide a comprehensive analysis for the
optimization geometry of a matrix factorization problem where we aim to find
and matrices and such that approximates
a given matrix . Aside from the robust strict saddle property, we show
that the objective function of the matrix factorization problem has no spurious
local minima and obeys the strict saddle property not only for the
exact-parameterization case where , but also for the
over-parameterization case where and the
under-parameterization case where . These geometric
properties imply that a number of iterative optimization algorithms (such as
gradient descent) converge to a global solution with random initialization
Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent
We address the rectangular matrix completion problem by lifting the unknown
matrix to a positive semidefinite matrix in higher dimension, and optimizing a
nonconvex objective over the semidefinite factor using a simple gradient
descent scheme. With random
observations of a -incoherent matrix of rank and
condition number , where , the algorithm linearly
converges to the global optimum with high probability
Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization
This paper studies noisy low-rank matrix completion: given partial and noisy
entries of a large low-rank matrix, the goal is to estimate the underlying
matrix faithfully and efficiently. Arguably one of the most popular paradigms
to tackle this problem is convex relaxation, which achieves remarkable efficacy
in practice. However, the theoretical support of this approach is still far
from optimal in the noisy setting, falling short of explaining its empirical
success.
We make progress towards demystifying the practical efficacy of convex
relaxation vis-\`a-vis random noise. When the rank and the condition number of
the unknown matrix are bounded by a constant, we demonstrate that the convex
programming approach achieves near-optimal estimation errors --- in terms of
the Euclidean loss, the entrywise loss, and the spectral norm loss --- for a
wide range of noise levels. All of this is enabled by bridging convex
relaxation with the nonconvex Burer-Monteiro approach, a seemingly distinct
algorithmic paradigm that is provably robust against noise. More specifically,
we show that an approximate critical point of the nonconvex formulation serves
as an extremely tight approximation of the convex solution, thus allowing us to
transfer the desired statistical guarantees of the nonconvex approach to its
convex counterpart
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
When the linear measurements of an instance of low-rank matrix recovery
satisfy a restricted isometry property (RIP)---i.e. they are approximately
norm-preserving---the problem is known to contain no spurious local minima, so
exact recovery is guaranteed. In this paper, we show that moderate RIP is not
enough to eliminate spurious local minima, so existing results can only hold
for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that
every x is the spurious local minimum of a rank-1 instance of matrix recovery
that satisfies RIP. One specific counterexample has RIP constant ,
but causes randomly initialized stochastic gradient descent (SGD) to fail 12%
of the time. SGD is frequently able to avoid and escape spurious local minima,
but this empirical result shows that it can occasionally be defeated by their
existence. Hence, while exact recovery guarantees will likely require a proof
of no spurious local minima, arguments based solely on norm preservation will
only be applicable to a narrow set of nearly-isotropic instances.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018
- …