36 research outputs found
Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems
Given a large data matrix , we consider the
problem of determining whether its entries are i.i.d. with some known marginal
distribution , or instead contains a principal submatrix
whose entries have marginal distribution . As a special case, the hidden (or planted) clique problem
requires to find a planted clique in an otherwise uniformly random graph.
Assuming unbounded computational resources, this hypothesis testing problem
is statistically solvable provided for a suitable
constant . However, despite substantial effort, no polynomial time algorithm
is known that succeeds with high probability when .
Recently Meka and Wigderson \cite{meka2013association}, proposed a method to
establish lower bounds within the Sum of Squares (SOS) semidefinite hierarchy.
Here we consider the degree- SOS relaxation, and study the construction of
\cite{meka2013association} to prove that SOS fails unless . An argument presented by Barak implies that this lower bound
cannot be substantially improved unless the witness construction is changed in
the proof. Our proof uses the moments method to bound the spectrum of a certain
random association scheme, i.e. a symmetric random matrix whose rows and
columns are indexed by the edges of an Erd\"os-Renyi random graph.Comment: 40 pages, 1 table, conferenc
Information-theoretically Optimal Sparse PCA
Sparse Principal Component Analysis (PCA) is a dimensionality reduction
technique wherein one seeks a low-rank representation of a data matrix with
additional sparsity constraints on the obtained representation. We consider two
probabilistic formulations of sparse PCA: a spiked Wigner and spiked Wishart
(or spiked covariance) model. We analyze an Approximate Message Passing (AMP)
algorithm to estimate the underlying signal and show, in the high dimensional
limit, that the AMP estimates are information-theoretically optimal. As an
immediate corollary, our results demonstrate that the posterior expectation of
the underlying signal, which is often intractable to compute, can be obtained
using a polynomial-time scheme. Our results also effectively provide a
single-letter characterization of the sparse PCA problem.Comment: 5 pages, 1 figure, conferenc
Asymptotic Mutual Information for the Two-Groups Stochastic Block Model
We develop an information-theoretic view of the stochastic block model, a
popular statistical model for the large-scale structure of complex networks. A
graph from such a model is generated by first assigning vertex labels at
random from a finite alphabet, and then connecting vertices with edge
probabilities depending on the labels of the endpoints. In the case of the
symmetric two-group model, we establish an explicit `single-letter'
characterization of the per-vertex mutual information between the vertex labels
and the graph.
The explicit expression of the mutual information is intimately related to
estimation-theoretic quantities, and --in particular-- reveals a phase
transition at the critical point for community detection. Below the critical
point the per-vertex mutual information is asymptotically the same as if edges
were independent. Correspondingly, no algorithm can estimate the partition
better than random guessing. Conversely, above the threshold, the per-vertex
mutual information is strictly smaller than the independent-edges upper bound.
In this regime there exists a procedure that estimates the vertex labels better
than random guessing.Comment: 41 pages, 3 pdf figure
Agreement and Statistical Efficiency in Bayesian Perception Models
Bayesian models of group learning are studied in Economics since the 1970s
and more recently in computational linguistics. The models from Economics
postulate that agents maximize utility in their communication and actions. The
Economics models do not explain the ``probability matching" phenomena that are
observed in many experimental studies. To address these observations, Bayesian
models that do not formally fit into the economic utility maximization
framework were introduced. In these models individuals sample from their
posteriors in communication. In this work we study the asymptotic behavior of
such models on connected networks with repeated communication. Perhaps
surprisingly, despite the fact that individual agents are not utility
maximizers in the classical sense, we establish that the individuals ultimately
agree and furthermore show that the limiting posterior is Bayes optimal.Comment: 15 pages, 2 figure