25 research outputs found
The power of sum-of-squares for detecting hidden structures
We study planted problems---finding hidden structures in random noisy
inputs---through the lens of the sum-of-squares semidefinite programming
hierarchy (SoS). This family of powerful semidefinite programs has recently
yielded many new algorithms for planted problems, often achieving the best
known polynomial-time guarantees in terms of accuracy of recovered solutions
and robustness to noise. One theme in recent work is the design of spectral
algorithms which match the guarantees of SoS algorithms for planted problems.
Classical spectral algorithms are often unable to accomplish this: the twist in
these new spectral algorithms is the use of spectral structure of matrices
whose entries are low-degree polynomials of the input variables. We prove that
for a wide class of planted problems, including refuting random constraint
satisfaction problems, tensor and sparse PCA, densest-k-subgraph, community
detection in stochastic block models, planted clique, and others, eigenvalues
of degree-d matrix polynomials are as powerful as SoS semidefinite programs of
roughly degree d. For such problems it is therefore always possible to match
the guarantees of SoS without solving a large semidefinite program. Using
related ideas on SoS algorithms and low-degree matrix polynomials (and inspired
by recent work on SoS and the planted clique problem by Barak et al.), we prove
new nearly-tight SoS lower bounds for the tensor and sparse principal component
analysis problems. Our lower bounds for sparse principal component analysis are
the first to suggest that going beyond existing algorithms for this problem may
require sub-exponential time
Partial recovery bounds for clustering with the relaxed means
We investigate the clustering performances of the relaxed means in the
setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM).
After identifying the appropriate signal-to-noise ratio (SNR), we prove that
the misclassification error decay exponentially fast with respect to this SNR.
These partial recovery bounds for the relaxed means improve upon results
currently known in the sGMM setting. In the SBM setting, applying the relaxed
means SDP allows to handle general connection probabilities whereas other
SDPs investigated in the literature are restricted to the assortative case
(where within group probabilities are larger than between group probabilities).
Again, this partial recovery bound complements the state-of-the-art results.
All together, these results put forward the versatility of the relaxed
means.Comment: 39 page
Recommended from our members
Relax, descend and certify : optimization techniques for typically tractable data problems
In this thesis we explore different mathematical techniques for extracting information from data. In particular we focus in machine learning problems such as clustering and data cloud alignment. Both problems are intractable in the "worst case", but we show that convex relaxations can efficiently find the exact or almost exact solution for classes of "typical" instances.
We study different roles that optimization techniques can play in understanding and processing data. These include efficient algorithms with mathematical guarantees, a posteriori methods for quality evaluation of solutions, and algorithmic relaxation of mathematical models.
We develop probabilistic and data-driven techniques to model data and evaluate performance of algorithms for data problems.Mathematic
Fundamental Limits of Low-Rank Matrix Estimation with Diverging Aspect Ratios
We consider the problem of estimating the factors of a low-rank
matrix, when this is corrupted by additive Gaussian noise. A special example of
our setting corresponds to clustering mixtures of Gaussians with equal (known)
covariances. Simple spectral methods do not take into account the distribution
of the entries of these factors and are therefore often suboptimal. Here, we
characterize the asymptotics of the minimum estimation error under the
assumption that the distribution of the entries is known to the statistician.
Our results apply to the high-dimensional regime and (or ) and generalize earlier work that focused on the
proportional asymptotics , .
We outline an interesting signal strength regime in which
and partial recovery is possible for the left singular vectors while impossible
for the right singular vectors.
We illustrate the general theory by deriving consequences for Gaussian
mixture clustering and carrying out a numerical study on genomics data.Comment: 74 pages, 5 figure