23,175 research outputs found
Non-linear Log-Sobolev inequalities for the Potts semigroup and applications to reconstruction problems
Consider a Markov process with state space , which jumps continuously to
a new state chosen uniformly at random and regardless of the previous state.
The collection of transition kernels (indexed by time ) is the Potts
semigroup. Diaconis and Saloff-Coste computed the maximum of the ratio of the
relative entropy and the Dirichlet form obtaining the constant in
the -log-Sobolev inequality (-LSI). In this paper, we obtain the best
possible non-linear inequality relating entropy and the Dirichlet form (i.e.,
-NLSI, ). As an example, we show . The more precise NLSIs have been shown by Polyanskiy and Samorodnitsky to
imply various geometric and Fourier-analytic results.
Beyond the Potts semigroup, we also analyze Potts channels -- Markov
transition matrices constant on and off diagonal. (Potts
semigroup corresponds to a (ferromagnetic) subset of matrices with positive
second eigenvalue). By integrating the -NLSI we obtain the new strong data
processing inequality (SDPI), which in turn allows us to improve results on
reconstruction thresholds for Potts models on trees. A special case is the
problem of reconstructing color of the root of a -colored tree given
knowledge of colors of all the leaves. We show that to have a non-trivial
reconstruction probability the branching number of the tree should be at least
This extends previous
results (of Sly and Bhatnagar et al.) to general trees, and avoids the need for
any specialized arguments. Similarly, we improve the state-of-the-art on
reconstruction threshold for the stochastic block model with balanced
groups, for all . These improvements advocate information-theoretic
methods as a useful complement to the conventional techniques originating from
the statistical physics
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Broadcasting on Random Directed Acyclic Graphs
We study a generalization of the well-known model of broadcasting on trees.
Consider a directed acyclic graph (DAG) with a unique source vertex , and
suppose all other vertices have indegree . Let the vertices at
distance from be called layer . At layer , is given a random
bit. At layer , each vertex receives bits from its parents in
layer , which are transmitted along independent binary symmetric channel
edges, and combines them using a -ary Boolean processing function. The goal
is to reconstruct with probability of error bounded away from using
the values of all vertices at an arbitrarily deep layer. This question is
closely related to models of reliable computation and storage, and information
flow in biological networks.
In this paper, we analyze randomly constructed DAGs, for which we show that
broadcasting is only possible if the noise level is below a certain degree and
function dependent critical threshold. For , and random DAGs with
layer sizes and majority processing functions, we identify the
critical threshold. For , we establish a similar result for NAND
processing functions. We also prove a partial converse for odd
illustrating that the identified thresholds are impossible to improve by
selecting different processing functions if the decoder is restricted to using
a single vertex.
Finally, for any noise level, we construct explicit DAGs (using expander
graphs) with bounded degree and layer sizes admitting
reconstruction. In particular, we show that such DAGs can be generated in
deterministic quasi-polynomial time or randomized polylogarithmic time in the
depth. These results portray a doubly-exponential advantage for storing a bit
in DAGs compared to trees, where but layer sizes must grow exponentially
with depth in order to enable broadcasting.Comment: 33 pages, double column format. arXiv admin note: text overlap with
arXiv:1803.0752
Efficient size estimation and impossibility of termination in uniform dense population protocols
We study uniform population protocols: networks of anonymous agents whose
pairwise interactions are chosen at random, where each agent uses an identical
transition algorithm that does not depend on the population size . Many
existing polylog time protocols for leader election and majority
computation are nonuniform: to operate correctly, they require all agents to be
initialized with an approximate estimate of (specifically, the exact value
). Our first main result is a uniform protocol for
calculating with high probability in time and
states ( bits of memory). The protocol is
converging but not terminating: it does not signal when the estimate is close
to the true value of . If it could be made terminating, this would
allow composition with protocols, such as those for leader election or
majority, that require a size estimate initially, to make them uniform (though
with a small probability of failure). We do show how our main protocol can be
indirectly composed with others in a simple and elegant way, based on the
leaderless phase clock, demonstrating that those protocols can in fact be made
uniform. However, our second main result implies that the protocol cannot be
made terminating, a consequence of a much stronger result: a uniform protocol
for any task requiring more than constant time cannot be terminating even with
probability bounded above 0, if infinitely many initial configurations are
dense: any state present initially occupies agents. (In particular,
no leader is allowed.) Crucially, the result holds no matter the memory or time
permitted. Finally, we show that with an initial leader, our size-estimation
protocol can be made terminating with high probability, with the same
asymptotic time and space bounds.Comment: Using leaderless phase cloc
Incompatibility boundaries for properties of community partitions
We prove the incompatibility of certain desirable properties of community
partition quality functions. Our results generalize the impossibility result of
[Kleinberg 2003] by considering sets of weaker properties. In particular, we
use an alternative notion to solve the central issue of the consistency
property. (The latter means that modifying the graph in a way consistent with a
partition should not have counterintuitive effects). Our results clearly show
that community partition methods should not be expected to perfectly satisfy
all ideally desired properties.
We then proceed to show that this incompatibility no longer holds when
slightly relaxed versions of the properties are considered, and we provide in
fact examples of simple quality functions satisfying these relaxed properties.
An experimental study of these quality functions shows a behavior comparable to
established methods in some situations, but more debatable results in others.
This suggests that defining a notion of good partition in communities probably
requires imposing additional properties.Comment: 17 pages, 3 figure
Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results
The classical setting of community detection consists of networks exhibiting
a clustered structure. To more accurately model real systems we consider a
class of networks (i) whose edges may carry labels and (ii) which may lack a
clustered structure. Specifically we assume that nodes possess latent
attributes drawn from a general compact space and edges between two nodes are
randomly generated and labeled according to some unknown distribution as a
function of their latent attributes. Our goal is then to infer the edge label
distributions from a partially observed network. We propose a computationally
efficient spectral algorithm and show it allows for asymptotically correct
inference when the average node degree could be as low as logarithmic in the
total number of nodes. Conversely, if the average node degree is below a
specific constant threshold, we show that no algorithm can achieve better
inference than guessing without using the observations. As a byproduct of our
analysis, we show that our model provides a general procedure to construct
random graph models with a spectrum asymptotic to a pre-specified eigenvalue
distribution such as a power-law distribution.Comment: 17 page
- …