4 research outputs found
The Global Geometry of Centralized and Distributed Low-rank Matrix Recovery without Regularization
Low-rank matrix recovery is a fundamental problem in signal processing and
machine learning. A recent very popular approach to recovering a low-rank
matrix X is to factorize it as a product of two smaller matrices, i.e., X =
UV^T, and then optimize over U, V instead of X. Despite the resulting
non-convexity, recent results have shown that many factorized objective
functions actually have benign global geometry---with no spurious local minima
and satisfying the so-called strict saddle property---ensuring convergence to a
global minimum for many local-search algorithms. Such results hold whenever the
original objective function is restricted strongly convex and smooth. However,
most of these results actually consider a modified cost function that includes
a balancing regularizer. While useful for deriving theory, this balancing
regularizer does not appear to be necessary in practice. In this work, we close
this theory-practice gap by proving that the unaltered factorized non-convex
problem, without the balancing regularizer, also has similar benign global
geometry. Moreover, we also extend our theoretical results to the field of
distributed optimization
PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization
Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes O(polylog(d)/ϵ2) iterations to achieve an (ϵ,ϵ√)-SOSP with high probability, where polylog(d) denotes the polynomial of the logarithm with respect to problem dimension d