34 research outputs found
Community Detection in Hypergraphs, Spiked Tensor Models, and Sum-of-Squares
We study the problem of community detection in hypergraphs under a stochastic
block model. Similarly to how the stochastic block model in graphs suggests
studying spiked random matrices, our model motivates investigating statistical
and computational limits of exact recovery in a certain spiked tensor model. In
contrast with the matrix case, the spiked model naturally arising from
community detection in hypergraphs is different from the one arising in the
so-called tensor Principal Component Analysis model. We investigate the
effectiveness of algorithms in the Sum-of-Squares hierarchy on these models.
Interestingly, our results suggest that these two apparently similar models
exhibit significantly different computational to statistical gaps.Comment: In proceedings of 2017 International Conference on Sampling Theory
and Applications (SampTA
Computational Hardness of Certifying Bounds on Constrained PCA Problems
Given a random n×n symmetric matrix W drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form x⊤Wx over all vectors x in a constraint set S⊂Rn. For a certain class of normalized constraint sets S we show that, conditional on certain complexity-theoretic assumptions, there is no polynomial-time algorithm certifying a better upper bound than the largest eigenvalue of W. A notable special case included in our results is the hypercube S={±1/n−−√}n, which corresponds to the problem of certifying bounds on the Hamiltonian of the Sherrington-Kirkpatrick spin glass model from statistical physics.
Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negatively-spiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no low-degree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for identifying computational thresholds was proposed in a sequence of recent works on the sum-of-squares hierarchy, and is believed to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over x∈S is much larger than that of a GOE matrix.ISSN:1868-896
Exact Partitioning of High-order Planted Models with a Tensor Nuclear Norm Constraint
We study the problem of efficient exact partitioning of the hypergraphs
generated by high-order planted models. A high-order planted model assumes some
underlying cluster structures, and simulates high-order interactions by placing
hyperedges among nodes. Example models include the disjoint hypercliques, the
densest subhypergraphs, and the hypergraph stochastic block models. We show
that exact partitioning of high-order planted models (a NP-hard problem in
general) is achievable through solving a computationally efficient convex
optimization problem with a tensor nuclear norm constraint. Our analysis
provides the conditions for our approach to succeed on recovering the true
underlying cluster structures, with high probability
Statistical limits of graphical channel models and a semidefinite programming approach
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 205-213).Community recovery is a major challenge in data science and computer science. The goal in community recovery is to find the hidden clusters from given relational data, which is often represented as a labeled hyper graph where nodes correspond to items needing to be labeled and edges correspond to observed relations between the items. We investigate the problem of exact recovery in the class of statistical models which can be expressed in terms of graphical channels. In a graphical channel model, we observe noisy measurements of the relations between k nodes while the true labeling is unknown to us, and the goal is to recover the labels correctly. This generalizes both the stochastic block models and spiked tensor models for principal component analysis, which has gained much interest over the last decade. We focus on two aspects of exact recovery: statistical limits and efficient algorithms achieving the statistic limit. For the statistical limits, we show that the achievability of exact recovery is essentially determined by whether we can recover the label of one node given other nodes labels with fairly high probability. This phenomenon was observed by Abbe et al. for generic stochastic block models, and called "local-to-global amplification". We confirm that local-to-global amplification indeed holds for generic graphical channel models, under some regularity assumptions. As a corollary, the threshold for exact recovery is explicitly determined. For algorithmic concerns, we consider two examples of graphical channel models, (i) the spiked tensor model with additive Gaussian noise, and (ii) the generalization of the stochastic block model for k-uniform hypergraphs. We propose a strategy which we call "truncate-and-relax", based on a standard semidefinite relaxation technique. We show that in these two models, the algorithm based on this strategy achieves exact recovery up to a threshold which orderwise matches the statistical threshold. We complement this by showing the limitation of the algorithm.by Chiheon Kim.Ph. D
Exact Recovery for a Family of Community-Detection Generative Models
Generative models for networks with communities have been studied extensively
for being a fertile ground to establish information-theoretic and computational
thresholds. In this paper we propose a new toy model for planted generative
models called planted Random Energy Model (REM), inspired by Derrida's REM. For
this model we provide the asymptotic behaviour of the probability of error for
the maximum likelihood estimator and hence the exact recovery threshold. As an
application, we further consider the 2 non-equally sized community Weighted
Stochastic Block Model (2-WSBM) on -uniform hypergraphs, that is equivalent
to the P-REM on both sides of the spectrum, for high and low edge cardinality
. We provide upper and lower bounds for the exact recoverability for any
, mapping these problems to the aforementioned P-REM. To the best of our
knowledge these are the first consistency results for the 2-WSBM on graphs and
on hypergraphs with non-equally sized community
The power of sum-of-squares for detecting hidden structures
We study planted problems---finding hidden structures in random noisy
inputs---through the lens of the sum-of-squares semidefinite programming
hierarchy (SoS). This family of powerful semidefinite programs has recently
yielded many new algorithms for planted problems, often achieving the best
known polynomial-time guarantees in terms of accuracy of recovered solutions
and robustness to noise. One theme in recent work is the design of spectral
algorithms which match the guarantees of SoS algorithms for planted problems.
Classical spectral algorithms are often unable to accomplish this: the twist in
these new spectral algorithms is the use of spectral structure of matrices
whose entries are low-degree polynomials of the input variables. We prove that
for a wide class of planted problems, including refuting random constraint
satisfaction problems, tensor and sparse PCA, densest-k-subgraph, community
detection in stochastic block models, planted clique, and others, eigenvalues
of degree-d matrix polynomials are as powerful as SoS semidefinite programs of
roughly degree d. For such problems it is therefore always possible to match
the guarantees of SoS without solving a large semidefinite program. Using
related ideas on SoS algorithms and low-degree matrix polynomials (and inspired
by recent work on SoS and the planted clique problem by Barak et al.), we prove
new nearly-tight SoS lower bounds for the tensor and sparse principal component
analysis problems. Our lower bounds for sparse principal component analysis are
the first to suggest that going beyond existing algorithms for this problem may
require sub-exponential time