703 research outputs found
Optimal Testing for Planted Satisfiability Problems
We study the problem of detecting planted solutions in a random
satisfiability formula. Adopting the formalism of hypothesis testing in
statistical analysis, we describe the minimax optimal rates of detection. Our
analysis relies on the study of the number of satisfying assignments, for which
we prove new results. We also address algorithmic issues, and give a
computationally efficient test with optimal statistical performance. This
result is compared to an average-case hypothesis on the hardness of refuting
satisfiability of random formulas
Spectral Detection on Sparse Hypergraphs
We consider the problem of the assignment of nodes into communities from a
set of hyperedges, where every hyperedge is a noisy observation of the
community assignment of the adjacent nodes. We focus in particular on the
sparse regime where the number of edges is of the same order as the number of
vertices. We propose a spectral method based on a generalization of the
non-backtracking Hashimoto matrix into hypergraphs. We analyze its performance
on a planted generative model and compare it with other spectral methods and
with Bayesian belief propagation (which was conjectured to be asymptotically
optimal for this model). We conclude that the proposed spectral method detects
communities whenever belief propagation does, while having the important
advantages to be simpler, entirely nonparametric, and to be able to learn the
rule according to which the hyperedges were generated without prior
information.Comment: 8 pages, 5 figure
Asymptotic Mutual Information for the Two-Groups Stochastic Block Model
We develop an information-theoretic view of the stochastic block model, a
popular statistical model for the large-scale structure of complex networks. A
graph from such a model is generated by first assigning vertex labels at
random from a finite alphabet, and then connecting vertices with edge
probabilities depending on the labels of the endpoints. In the case of the
symmetric two-group model, we establish an explicit `single-letter'
characterization of the per-vertex mutual information between the vertex labels
and the graph.
The explicit expression of the mutual information is intimately related to
estimation-theoretic quantities, and --in particular-- reveals a phase
transition at the critical point for community detection. Below the critical
point the per-vertex mutual information is asymptotically the same as if edges
were independent. Correspondingly, no algorithm can estimate the partition
better than random guessing. Conversely, above the threshold, the per-vertex
mutual information is strictly smaller than the independent-edges upper bound.
In this regime there exists a procedure that estimates the vertex labels better
than random guessing.Comment: 41 pages, 3 pdf figure
Local convergence of random graph colorings
Let be a random graph whose average degree is below the
-colorability threshold. If we sample a -coloring of
uniformly at random, what can we say about the correlations between the colors
assigned to vertices that are far apart? According to a prediction from
statistical physics, for average degrees below the so-called {\em condensation
threshold} , the colors assigned to far away vertices are
asymptotically independent [Krzakala et al.: Proc. National Academy of Sciences
2007]. We prove this conjecture for exceeding a certain constant .
More generally, we investigate the joint distribution of the -colorings that
induces locally on the bounded-depth neighborhoods of any fixed number
of vertices. In addition, we point out an implication on the reconstruction
problem
Strong replica symmetry in high-dimensional optimal Bayesian inference
We consider generic optimal Bayesian inference, namely, models of signal
reconstruction where the posterior distribution and all hyperparameters are
known. Under a standard assumption on the concentration of the free energy, we
show how replica symmetry in the strong sense of concentration of all
multioverlaps can be established as a consequence of the Franz-de Sanctis
identities; the identities themselves in the current setting are obtained via a
novel perturbation coming from exponentially distributed "side-observations" of
the signal. Concentration of multioverlaps means that asymptotically the
posterior distribution has a particularly simple structure encoded by a random
probability measure (or, in the case of binary signal, a non-random probability
measure). We believe that such strong control of the model should be key in the
study of inference problems with underlying sparse graphical structure (error
correcting codes, block models, etc) and, in particular, in the rigorous
derivation of replica symmetric formulas for the free energy and mutual
information in this context
- …