308 research outputs found
IHT dies hard: Provable accelerated Iterative Hard Thresholding
We study --both in theory and practice-- the use of momentum motions in
classic iterative hard thresholding (IHT) methods. By simply modifying plain
IHT, we investigate its convergence behavior on convex optimization criteria
with non-convex constraints, under standard assumptions. In diverse scenaria,
we observe that acceleration in IHT leads to significant improvements, compared
to state of the art projected gradient descent and Frank-Wolfe variants. As a
byproduct of our inspection, we study the impact of selecting the momentum
parameter: similar to convex settings, two modes of behavior are observed
--"rippling" and linear-- depending on the level of momentum.Comment: accepted to AISTATS 201
Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation
Low-rank modeling plays a pivotal role in signal processing and machine
learning, with applications ranging from collaborative filtering, video
surveillance, medical imaging, to dimensionality reduction and adaptive
filtering. Many modern high-dimensional data and interactions thereof can be
modeled as lying approximately in a low-dimensional subspace or manifold,
possibly with additional structures, and its proper exploitations lead to
significant reduction of costs in sensing, computation and storage. In recent
years, there is a plethora of progress in understanding how to exploit low-rank
structures using computationally efficient procedures in a provable manner,
including both convex and nonconvex approaches. On one side, convex relaxations
such as nuclear norm minimization often lead to statistically optimal
procedures for estimating low-rank matrices, where first-order methods are
developed to address the computational challenges; on the other side, there is
emerging evidence that properly designed nonconvex procedures, such as
projected gradient descent, often provide globally optimal solutions with a
much lower computational cost in many problems. This survey article will
provide a unified overview of these recent advances on low-rank matrix
estimation from incomplete measurements. Attention is paid to rigorous
characterization of the performance of these algorithms, and to problems where
the low-rank matrix have additional structural properties that require new
algorithmic designs and theoretical analysis.Comment: To appear in IEEE Signal Processing Magazin
Cubic Regularization with Momentum for Nonconvex Optimization
Momentum is a popular technique to accelerate the convergence in practical
training, and its impact on convergence guarantee has been well-studied for
first-order algorithms. However, such a successful acceleration technique has
not yet been proposed for second-order algorithms in nonconvex optimization.In
this paper, we apply the momentum scheme to cubic regularized (CR) Newton's
method and explore the potential for acceleration. Our numerical experiments on
various nonconvex optimization problems demonstrate that the momentum scheme
can substantially facilitate the convergence of cubic regularization, and
perform even better than the Nesterov's acceleration scheme for CR.
Theoretically, we prove that CR under momentum achieves the best possible
convergence rate to a second-order stationary point for nonconvex optimization.
Moreover, we study the proposed algorithm for solving problems satisfying an
error bound condition and establish a local quadratic convergence rate. Then,
particularly for finite-sum problems, we show that the proposed algorithm can
allow computational inexactness that reduces the overall sample complexity
without degrading the convergence rate
Exploiting the structure effectively and efficiently in low rank matrix recovery
Low rank model arises from a wide range of applications, including machine
learning, signal processing, computer algebra, computer vision, and imaging
science. Low rank matrix recovery is about reconstructing a low rank matrix
from incomplete measurements. In this survey we review recent developments on
low rank matrix recovery, focusing on three typical scenarios: matrix sensing,
matrix completion and phase retrieval. An overview of effective and efficient
approaches for the problem is given, including nuclear norm minimization,
projected gradient descent based on matrix factorization, and Riemannian
optimization based on the embedded manifold of low rank matrices. Numerical
recipes of different approaches are emphasized while accompanied by the
corresponding theoretical recovery guarantees
On the Suboptimality of Proximal Gradient Descent for Sparse Approximation
We study the proximal gradient descent (PGD) method for sparse
approximation problem as well as its accelerated optimization with randomized
algorithms in this paper. We first offer theoretical analysis of PGD showing
the bounded gap between the sub-optimal solution by PGD and the globally
optimal solution for the sparse approximation problem under
conditions weaker than Restricted Isometry Property widely used in compressive
sensing literature. Moreover, we propose randomized algorithms to accelerate
the optimization by PGD using randomized low rank matrix approximation
(PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized
algorithms substantially reduces the computation cost of the original PGD for
the sparse approximation problem, and the resultant sub-optimal
solution still enjoys provable suboptimality, namely, the sub-optimal solution
to the reduced problem still has bounded gap to the globally optimal solution
to the original problem
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
Solving systems of phaseless equations via Riemannian optimization with optimal sampling complexity
A Riemannian gradient descent algorithm and a truncated variant are presented
to solve systems of phaseless equations . The algorithms are
developed by exploiting the inherent low rank structure of the problem based on
the embedded manifold of rank- positive semidefinite matrices. Theoretical
recovery guarantee has been established for the truncated variant, showing that
the algorithm is able to achieve successful recovery when the number of
equations is proportional to the number of unknowns. Two key ingredients in the
analysis are the restricted well conditioned property and the restricted weak
correlation property of the associated truncated linear operator. Empirical
evaluations show that our algorithms are competitive with other
state-of-the-art first order nonconvex approaches with provable guarantees
Provable quantum state tomography via non-convex methods
With nowadays steadily growing quantum processors, it is required to develop
new quantum tomography tools that are tailored for high-dimensional systems. In
this work, we describe such a computational tool, based on recent ideas from
non-convex optimization. The algorithm excels in the compressed-sensing-like
setting, where only a few data points are measured from a low-rank or
highly-pure quantum state of a high-dimensional system. We show that the
algorithm can practically be used in quantum tomography problems that are
beyond the reach of convex solvers, and, moreover, is faster than other
state-of-the-art non-convex approaches. Crucially, we prove that, despite being
a non-convex program, under mild conditions, the algorithm is guaranteed to
converge to the global minimum of the problem; thus, it constitutes a provable
quantum state tomography protocol.Comment: 21 pages, 26 figures, code include
Introduction to Nonnegative Matrix Factorization
In this paper, we introduce and provide a short overview of nonnegative
matrix factorization (NMF). Several aspects of NMF are discussed, namely, the
application in hyperspectral imaging, geometry and uniqueness of NMF solutions,
complexity, algorithms, and its link with extended formulations of polyhedra.
In order to put NMF into perspective, the more general problem class of
constrained low-rank matrix approximation problems is first briefly introduced.Comment: 18 pages, 4 figure
Analytical Convergence Regions of Accelerated Gradient Descent in Nonconvex Optimization under Regularity Condition
There is a growing interest in using robust control theory to analyze and
design optimization and machine learning algorithms. This paper studies a class
of nonconvex optimization problems whose cost functions satisfy the so-called
Regularity Condition (RC). Empirical studies show that accelerated gradient
descent (AGD) algorithms (e.g. Nesterov's acceleration and Heavy-ball) with
proper initializations often work well in practice. However, the convergence of
such AGD algorithms is largely unknown in the literature. The main contribution
of this paper is the analytical characterization of the convergence regions of
AGD under RC via robust control tools. Since such optimization problems arise
frequently in many applications such as phase retrieval, training of neural
networks and matrix sensing, our result shows promise of robust control theory
in these areas.Comment: Accepted to Automatic
- …