4 research outputs found

    The Global Geometry of Centralized and Distributed Low-rank Matrix Recovery without Regularization

    Full text link
    Low-rank matrix recovery is a fundamental problem in signal processing and machine learning. A recent very popular approach to recovering a low-rank matrix X is to factorize it as a product of two smaller matrices, i.e., X = UV^T, and then optimize over U, V instead of X. Despite the resulting non-convexity, recent results have shown that many factorized objective functions actually have benign global geometry---with no spurious local minima and satisfying the so-called strict saddle property---ensuring convergence to a global minimum for many local-search algorithms. Such results hold whenever the original objective function is restricted strongly convex and smooth. However, most of these results actually consider a modified cost function that includes a balancing regularizer. While useful for deriving theory, this balancing regularizer does not appear to be necessary in practice. In this work, we close this theory-practice gap by proving that the unaltered factorized non-convex problem, without the balancing regularizer, also has similar benign global geometry. Moreover, we also extend our theoretical results to the field of distributed optimization

    PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

    Get PDF
    Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes O(polylog(d)/ϵ2) iterations to achieve an (ϵ,ϵ√)-SOSP with high probability, where polylog(d) denotes the polynomial of the logarithm with respect to problem dimension d
    corecore