308 research outputs found

    IHT dies hard: Provable accelerated Iterative Hard Thresholding

    Full text link
    We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and Frank-Wolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed --"rippling" and linear-- depending on the level of momentum.Comment: accepted to AISTATS 201

    Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation

    Full text link
    Low-rank modeling plays a pivotal role in signal processing and machine learning, with applications ranging from collaborative filtering, video surveillance, medical imaging, to dimensionality reduction and adaptive filtering. Many modern high-dimensional data and interactions thereof can be modeled as lying approximately in a low-dimensional subspace or manifold, possibly with additional structures, and its proper exploitations lead to significant reduction of costs in sensing, computation and storage. In recent years, there is a plethora of progress in understanding how to exploit low-rank structures using computationally efficient procedures in a provable manner, including both convex and nonconvex approaches. On one side, convex relaxations such as nuclear norm minimization often lead to statistically optimal procedures for estimating low-rank matrices, where first-order methods are developed to address the computational challenges; on the other side, there is emerging evidence that properly designed nonconvex procedures, such as projected gradient descent, often provide globally optimal solutions with a much lower computational cost in many problems. This survey article will provide a unified overview of these recent advances on low-rank matrix estimation from incomplete measurements. Attention is paid to rigorous characterization of the performance of these algorithms, and to problems where the low-rank matrix have additional structural properties that require new algorithmic designs and theoretical analysis.Comment: To appear in IEEE Signal Processing Magazin

    Cubic Regularization with Momentum for Nonconvex Optimization

    Full text link
    Momentum is a popular technique to accelerate the convergence in practical training, and its impact on convergence guarantee has been well-studied for first-order algorithms. However, such a successful acceleration technique has not yet been proposed for second-order algorithms in nonconvex optimization.In this paper, we apply the momentum scheme to cubic regularized (CR) Newton's method and explore the potential for acceleration. Our numerical experiments on various nonconvex optimization problems demonstrate that the momentum scheme can substantially facilitate the convergence of cubic regularization, and perform even better than the Nesterov's acceleration scheme for CR. Theoretically, we prove that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization. Moreover, we study the proposed algorithm for solving problems satisfying an error bound condition and establish a local quadratic convergence rate. Then, particularly for finite-sum problems, we show that the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate

    Exploiting the structure effectively and efficiently in low rank matrix recovery

    Full text link
    Low rank model arises from a wide range of applications, including machine learning, signal processing, computer algebra, computer vision, and imaging science. Low rank matrix recovery is about reconstructing a low rank matrix from incomplete measurements. In this survey we review recent developments on low rank matrix recovery, focusing on three typical scenarios: matrix sensing, matrix completion and phase retrieval. An overview of effective and efficient approaches for the problem is given, including nuclear norm minimization, projected gradient descent based on matrix factorization, and Riemannian optimization based on the embedded manifold of low rank matrices. Numerical recipes of different approaches are emphasized while accompanied by the corresponding theoretical recovery guarantees

    On the Suboptimality of Proximal Gradient Descent for â„“0\ell^{0} Sparse Approximation

    Full text link
    We study the proximal gradient descent (PGD) method for â„“0\ell^{0} sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the â„“0\ell^{0} sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the â„“0\ell^{0} sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem

    Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

    Full text link
    Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.Comment: Invited overview articl

    Solving systems of phaseless equations via Riemannian optimization with optimal sampling complexity

    Full text link
    A Riemannian gradient descent algorithm and a truncated variant are presented to solve systems of phaseless equations ∣Ax∣2=y|Ax|^2=y. The algorithms are developed by exploiting the inherent low rank structure of the problem based on the embedded manifold of rank-11 positive semidefinite matrices. Theoretical recovery guarantee has been established for the truncated variant, showing that the algorithm is able to achieve successful recovery when the number of equations is proportional to the number of unknowns. Two key ingredients in the analysis are the restricted well conditioned property and the restricted weak correlation property of the associated truncated linear operator. Empirical evaluations show that our algorithms are competitive with other state-of-the-art first order nonconvex approaches with provable guarantees

    Provable quantum state tomography via non-convex methods

    Full text link
    With nowadays steadily growing quantum processors, it is required to develop new quantum tomography tools that are tailored for high-dimensional systems. In this work, we describe such a computational tool, based on recent ideas from non-convex optimization. The algorithm excels in the compressed-sensing-like setting, where only a few data points are measured from a low-rank or highly-pure quantum state of a high-dimensional system. We show that the algorithm can practically be used in quantum tomography problems that are beyond the reach of convex solvers, and, moreover, is faster than other state-of-the-art non-convex approaches. Crucially, we prove that, despite being a non-convex program, under mild conditions, the algorithm is guaranteed to converge to the global minimum of the problem; thus, it constitutes a provable quantum state tomography protocol.Comment: 21 pages, 26 figures, code include

    Introduction to Nonnegative Matrix Factorization

    Full text link
    In this paper, we introduce and provide a short overview of nonnegative matrix factorization (NMF). Several aspects of NMF are discussed, namely, the application in hyperspectral imaging, geometry and uniqueness of NMF solutions, complexity, algorithms, and its link with extended formulations of polyhedra. In order to put NMF into perspective, the more general problem class of constrained low-rank matrix approximation problems is first briefly introduced.Comment: 18 pages, 4 figure

    Analytical Convergence Regions of Accelerated Gradient Descent in Nonconvex Optimization under Regularity Condition

    Full text link
    There is a growing interest in using robust control theory to analyze and design optimization and machine learning algorithms. This paper studies a class of nonconvex optimization problems whose cost functions satisfy the so-called Regularity Condition (RC). Empirical studies show that accelerated gradient descent (AGD) algorithms (e.g. Nesterov's acceleration and Heavy-ball) with proper initializations often work well in practice. However, the convergence of such AGD algorithms is largely unknown in the literature. The main contribution of this paper is the analytical characterization of the convergence regions of AGD under RC via robust control tools. Since such optimization problems arise frequently in many applications such as phase retrieval, training of neural networks and matrix sensing, our result shows promise of robust control theory in these areas.Comment: Accepted to Automatic
    • …
    corecore