Search CORE

25 research outputs found

The power of sum-of-squares for detecting hidden structures

Author: Hopkins Samuel B.
Kothari Pravesh K.
Potechin Aaron
Raghavendra Prasad
Schramm Tselil
Steurer David
Publication venue
Publication date: 13/10/2017
Field of study

We study planted problems---finding hidden structures in random noisy inputs---through the lens of the sum-of-squares semidefinite programming hierarchy (SoS). This family of powerful semidefinite programs has recently yielded many new algorithms for planted problems, often achieving the best known polynomial-time guarantees in terms of accuracy of recovered solutions and robustness to noise. One theme in recent work is the design of spectral algorithms which match the guarantees of SoS algorithms for planted problems. Classical spectral algorithms are often unable to accomplish this: the twist in these new spectral algorithms is the use of spectral structure of matrices whose entries are low-degree polynomials of the input variables. We prove that for a wide class of planted problems, including refuting random constraint satisfaction problems, tensor and sparse PCA, densest-k-subgraph, community detection in stochastic block models, planted clique, and others, eigenvalues of degree-d matrix polynomials are as powerful as SoS semidefinite programs of roughly degree d. For such problems it is therefore always possible to match the guarantees of SoS without solving a large semidefinite program. Using related ideas on SoS algorithms and low-degree matrix polynomials (and inspired by recent work on SoS and the planted clique problem by Barak et al.), we prove new nearly-tight SoS lower bounds for the tensor and sparse principal component analysis problems. Our lower bounds for sparse principal component analysis are the first to suggest that going beyond existing algorithms for this problem may require sub-exponential time

arXiv.org e-Print Archive

Partial recovery bounds for clustering with the relaxed $K$ means

Author: Giraud Christophe
Verzelen Nicolas
Publication venue
Publication date: 01/01/2018
Field of study

We investigate the clustering performances of the relaxed

K

means in the setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM). After identifying the appropriate signal-to-noise ratio (SNR), we prove that the misclassification error decay exponentially fast with respect to this SNR. These partial recovery bounds for the relaxed

K

means improve upon results currently known in the sGMM setting. In the SBM setting, applying the relaxed

K

means SDP allows to handle general connection probabilities whereas other SDPs investigated in the literature are restricted to the assortative case (where within group probabilities are larger than between group probabilities). Again, this partial recovery bound complements the state-of-the-art results. All together, these results put forward the versatility of the relaxed

K

means.Comment: 39 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Recommended from our members

Relax, descend and certify : optimization techniques for typically tractable data problems

Author: Villar Lozano Maria Soledad
Publication venue
Publication date: 12/10/2017
Field of study

In this thesis we explore different mathematical techniques for extracting information from data. In particular we focus in machine learning problems such as clustering and data cloud alignment. Both problems are intractable in the "worst case", but we show that convex relaxations can efficiently find the exact or almost exact solution for classes of "typical" instances. We study different roles that optimization techniques can play in understanding and processing data. These include efficient algorithms with mathematical guarantees, a posteriori methods for quality evaluation of solutions, and algorithmic relaxation of mathematical models. We develop probabilistic and data-driven techniques to model data and evaluate performance of algorithms for data problems.Mathematic

Texas ScholarWorks

Fundamental Limits of Low-Rank Matrix Estimation with Diverging Aspect Ratios

Author: Montanari Andrea
Wu Yuchen
Publication venue
Publication date: 01/11/2022
Field of study

We consider the problem of estimating the factors of a low-rank

n \times d

matrix, when this is corrupted by additive Gaussian noise. A special example of our setting corresponds to clustering mixtures of Gaussians with equal (known) covariances. Simple spectral methods do not take into account the distribution of the entries of these factors and are therefore often suboptimal. Here, we characterize the asymptotics of the minimum estimation error under the assumption that the distribution of the entries is known to the statistician. Our results apply to the high-dimensional regime

n, d \to \infty

and

d / n \to \infty

(or

d / n \to 0

) and generalize earlier work that focused on the proportional asymptotics

n, d \to \infty

d / n \to \delta \in (0, \infty)

. We outline an interesting signal strength regime in which

d / n \to \infty

and partial recovery is possible for the left singular vectors while impossible for the right singular vectors. We illustrate the general theory by deriving consequences for Gaussian mixture clustering and carrying out a numerical study on genomics data.Comment: 74 pages, 5 figure

arXiv.org e-Print Archive