66 research outputs found
A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints
We propose a new algorithm to solve optimization problems of the form for a smooth function under the constraints that is positive
semidefinite and the diagonal blocks of are small identity matrices. Such
problems often arise as the result of relaxing a rank constraint (lifting). In
particular, many estimation tasks involving phases, rotations, orthonormal
bases or permutations fit in this framework, and so do certain relaxations of
combinatorial problems such as Max-Cut. The proposed algorithm exploits the
facts that (1) such formulations admit low-rank solutions, and (2) their
rank-restricted versions are smooth optimization problems on a Riemannian
manifold. Combining insights from both the Riemannian and the convex geometries
of the problem, we characterize when second-order critical points of the smooth
problem reveal KKT points of the semidefinite problem. We compare against state
of the art, mature software and find that, on certain interesting problem
instances, what we call the staircase method is orders of magnitude faster, is
more accurate and scales better. Code is available.Comment: 37 pages, 3 figure
Concentration of the Kirchhoff index for Erdos-Renyi graphs
Given an undirected graph, the resistance distance between two nodes is the
resistance one would measure between these two nodes in an electrical network
if edges were resistors. Summing these distances over all pairs of nodes yields
the so-called Kirchhoff index of the graph, which measures its overall
connectivity. In this work, we consider Erdos-Renyi random graphs. Since the
graphs are random, their Kirchhoff indices are random variables. We give
formulas for the expected value of the Kirchhoff index and show it concentrates
around its expectation. We achieve this by studying the trace of the
pseudoinverse of the Laplacian of Erdos-Renyi graphs. For synchronization (a
class of estimation problems on graphs) our results imply that acquiring
pairwise measurements uniformly at random is a good strategy, even if only a
vanishing proportion of the measurements can be acquired
Near-optimal bounds for phase synchronization
The problem of phase synchronization is to estimate the phases (angles) of a
complex unit-modulus vector from their noisy pairwise relative measurements
, where is a complex-valued Gaussian random matrix.
The maximum likelihood estimator (MLE) is a solution to a unit-modulus
constrained quadratic programming problem, which is nonconvex. Existing works
have proposed polynomial-time algorithms such as a semidefinite relaxation
(SDP) approach or the generalized power method (GPM) to solve it. Numerical
experiments suggest both of these methods succeed with high probability for
up to , yet, existing analyses only
confirm this observation for up to . In this
paper, we bridge the gap, by proving SDP is tight for , and GPM converges to the global optimum under
the same regime. Moreover, we establish a linear convergence rate for GPM, and
derive a tighter bound for the MLE. A novel technique we develop
in this paper is to track (theoretically) closely related sequences of
iterates, in addition to the sequence of iterates GPM actually produces. As a
by-product, we obtain an perturbation bound for leading
eigenvectors. Our result also confirms intuitions that use techniques from
statistical mechanics.Comment: 34 pages, 1 figur
Computational Complexity versus Statistical Performance on Sparse Recovery Problems
We show that several classical quantities controlling compressed sensing
performance directly match classical parameters controlling algorithmic
complexity. We first describe linearly convergent restart schemes on
first-order methods solving a broad range of compressed sensing problems, where
sharpness at the optimum controls convergence speed. We show that for sparse
recovery problems, this sharpness can be written as a condition number, given
by the ratio between true signal sparsity and the largest signal size that can
be recovered by the observation matrix. In a similar vein, Renegar's condition
number is a data-driven complexity measure for convex programs, generalizing
classical condition numbers for linear systems. We show that for a broad class
of compressed sensing problems, the worst case value of this algorithmic
complexity measure taken over all signals matches the restricted singular value
of the observation matrix which controls robust recovery performance. Overall,
this means in both cases that, in compressed sensing problems, a single
parameter directly controls both computational complexity and recovery
performance. Numerical experiments illustrate these points using several
classical algorithms.Comment: Final version, to appear in information and Inferenc
Non-Convex Phase Retrieval from STFT Measurements
The problem of recovering a one-dimensional signal from its Fourier transform
magnitude, called Fourier phase retrieval, is ill-posed in most cases. We
consider the closely-related problem of recovering a signal from its phaseless
short-time Fourier transform (STFT) measurements. This problem arises naturally
in several applications, such as ultra-short laser pulse characterization and
ptychography. The redundancy offered by the STFT enables unique recovery under
mild conditions. We show that in some cases the unique solution can be obtained
by the principal eigenvector of a matrix, constructed as the solution of a
simple least-squares problem. When these conditions are not met, we suggest
using the principal eigenvector of this matrix to initialize non-convex local
optimization algorithms and propose two such methods. The first is based on
minimizing the empirical risk loss function, while the second maximizes a
quadratic function on the manifold of phases. We prove that under appropriate
conditions, the proposed initialization is close to the underlying signal. We
then analyze the geometry of the empirical risk loss function and show
numerically that both gradient algorithms converge to the underlying signal
even with small redundancy in the measurements. In addition, the algorithms are
robust to noise
Smoothed analysis of the low-rank approach for smooth semidefinite programs
We consider semidefinite programs (SDPs) of size n with equality constraints.
In order to overcome scalability issues, Burer and Monteiro proposed a
factorized approach based on optimizing over a matrix Y of size by such
that is the SDP variable. The advantages of such formulation are
twofold: the dimension of the optimization variable is reduced and positive
semidefiniteness is naturally enforced. However, the problem in Y is
non-convex. In prior work, it has been shown that, when the constraints on the
factorized variable regularly define a smooth manifold, provided k is large
enough, for almost all cost matrices, all second-order stationary points
(SOSPs) are optimal. Importantly, in practice, one can only compute points
which approximately satisfy necessary optimality conditions, leading to the
question: are such points also approximately optimal? To this end, and under
similar assumptions, we use smoothed analysis to show that approximate SOSPs
for a randomly perturbed objective function are approximate global optima, with
k scaling like the square root of the number of constraints (up to log
factors). Moreover, we bound the optimality gap at the approximate solution of
the perturbed problem with respect to the original problem. We particularize
our results to an SDP relaxation of phase retrieval
Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices
Trust-region methods (TR) can converge quadratically to minima where the
Hessian is positive definite. However, if the minima are not isolated, then the
Hessian there cannot be positive definite. The weaker
Polyak\unicode{x2013}{\L}ojasiewicz (P{\L}) condition is compatible with
non-isolated minima, and it is enough for many algorithms to preserve good
local behavior. Yet, TR with an subproblem solver lacks even
basic features such as a capture theorem under P{\L}.
In practice, a popular subproblem solver is the truncated
conjugate gradient method (tCG). Empirically, TR-tCG exhibits super-linear
convergence under P{\L}. We confirm this theoretically.
The main mathematical obstacle is that, under P{\L}, at points arbitrarily
close to minima, the Hessian has vanishingly small, possibly negative
eigenvalues. Thus, tCG is applied to ill-conditioned, indefinite systems. Yet,
the core theory underlying tCG is that of CG, which assumes a positive definite
operator. Accordingly, we develop new tools to analyze the dynamics of CG in
the presence of small eigenvalues of any sign, for the regime of interest to
TR-tCG
- …