7,414 research outputs found
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Clustering from Sparse Pairwise Measurements
We consider the problem of grouping items into clusters based on few random
pairwise comparisons between the items. We introduce three closely related
algorithms for this task: a belief propagation algorithm approximating the
Bayes optimal solution, and two spectral algorithms based on the
non-backtracking and Bethe Hessian operators. For the case of two symmetric
clusters, we conjecture that these algorithms are asymptotically optimal in
that they detect the clusters as soon as it is information theoretically
possible to do so. We substantiate this claim for one of the spectral
approaches we introduce
Global and Local Information in Clustering Labeled Block Models
The stochastic block model is a classical cluster-exhibiting random graph
model that has been widely studied in statistics, physics and computer science.
In its simplest form, the model is a random graph with two equal-sized
clusters, with intra-cluster edge probability p, and inter-cluster edge
probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is
practically more relevant and also mathematically more challenging. A
conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from
statistical physics, predicted a specific threshold for clustering. The
negative direction of the conjecture was proved by Mossel, Neeman and Sly
(2012), and more recently the positive direction was proven independently by
Massoulie and Mossel, Neeman, and Sly.
In many real network clustering problems, nodes contain information as well.
We study the interplay between node and network information in clustering by
studying a labeled block model, where in addition to the edge information, the
true cluster labels of a small fraction of the nodes are revealed. In the case
of two clusters, we show that below the threshold, a small amount of node
information does not affect recovery. On the other hand, we show that for any
small amount of information efficient local clustering is achievable as long as
the number of clusters is sufficiently large (as a function of the amount of
revealed information).Comment: 24 pages, 2 figures. A short abstract describing these results will
appear in proceedings of RANDOM 201
Sparse Estimation with the Swept Approximated Message-Passing Algorithm
Approximate Message Passing (AMP) has been shown to be a superior method for
inference problems, such as the recovery of signals from sets of noisy,
lower-dimensionality measurements, both in terms of reconstruction accuracy and
in computational efficiency. However, AMP suffers from serious convergence
issues in contexts that do not exactly match its assumptions. We propose a new
approach to stabilizing AMP in these contexts by applying AMP updates to
individual coefficients rather than in parallel. Our results show that this
change to the AMP iteration can provide theoretically expected, but hitherto
unobtainable, performance for problems on which the standard AMP iteration
diverges. Additionally, we find that the computational costs of this swept
coefficient update scheme is not unduly burdensome, allowing it to be applied
efficiently to signals of large dimensionality.Comment: 11 pages, 3 figures, implementation available at
https://github.com/eric-tramel/SwAMP-Dem
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
- …