34 research outputs found
The minimum bisection in the planted bisection model
In the planted bisection model a random graph with
vertices is created by partitioning the vertices randomly into two classes of
equal size (up to ). Any two vertices that belong to the same class are
linked by an edge with probability and any two that belong to different
classes with probability independently. The planted bisection model
has been used extensively to benchmark graph partitioning algorithms. If
for numbers that remain fixed as
, then w.h.p. the ``planted'' bisection (the one used to construct
the graph) will not be a minimum bisection. In this paper we derive an
asymptotic formula for the minimum bisection width under the assumption that
for a certain constant
The Geometric Block Model
To capture the inherent geometric features of many community detection
problems, we propose to use a new random graph model of communities that we
call a Geometric Block Model. The geometric block model generalizes the random
geometric graphs in the same way that the well-studied stochastic block model
generalizes the Erdos-Renyi random graphs. It is also a natural extension of
random community models inspired by the recent theoretical and practical
advancement in community detection. While being a topic of fundamental
theoretical interest, our main contribution is to show that many practical
community structures are better explained by the geometric block model. We also
show that a simple triangle-counting algorithm to detect communities in the
geometric block model is near-optimal. Indeed, even in the regime where the
average degree of the graph grows only logarithmically with the number of
vertices (sparse-graph), we show that this algorithm performs extremely well,
both theoretically and practically. In contrast, the triangle-counting
algorithm is far from being optimum for the stochastic block model. We simulate
our results on both real and synthetic datasets to show superior performance of
both the new model as well as our algorithm.Comment: A shorter version of this paper has appeared in 32nd AAAI Conference
on Artificial Intelligence. The AAAI proceedings version as well as the
previous version in arxiv contained some errors that have been corrected in
this versio
Consistency Thresholds for the Planted Bisection Model
The planted bisection model is a random graph model in which the nodes are
divided into two equal-sized communities and then edges are added randomly in a
way that depends on the community membership. We establish necessary and
sufficient conditions for the asymptotic recoverability of the planted
bisection in this model. When the bisection is asymptotically recoverable, we
give an efficient algorithm that successfully recovers it. We also show that
the planted bisection is recoverable asymptotically if and only if with high
probability every node belongs to the same community as the majority of its
neighbors.
Our algorithm for finding the planted bisection runs in time almost linear in
the number of edges. It has three stages: spectral clustering to compute an
initial guess, a "replica" stage to get almost every vertex correct, and then
some simple local moves to finish the job. An independent work by Abbe,
Bandeira, and Hall establishes similar (slightly weaker) results but only in
the case of logarithmic average degree.Comment: latest version contains an erratum, addressing an error pointed out
by Jan van Waai
Stochastic Block Model and Community Detection in the Sparse Graphs: A spectral algorithm with optimal rate of recovery
In this paper, we present and analyze a simple and robust spectral algorithm
for the stochastic block model with blocks, for any fixed. Our
algorithm works with graphs having constant edge density, under an optimal
condition on the gap between the density inside a block and the density between
the blocks. As a co-product, we settle an open question posed by Abbe et. al.
concerning censor block models
Recovery, detection and confidence sets of communities in a sparse stochastic block model
Posterior distributions for community assignment in the planted bi-section
model are shown to achieve frequentist exact recovery and detection under sharp
lower bounds on sparsity. Assuming posterior recovery (or detection), one may
interpret credible sets (or enlarged credible sets) as consistent confidence
sets. If credible levels grow to one quickly enough, credible sets can be
interpreted as frequentist confidence sets without conditions on the
parameters. In the regime where within-class and between-class
edge-probabilities are very close, credible sets may be enlarged to achieve
frequentist asymptotic coverage. The diameters of credible sets are controlled
and match rates of posterior convergence.Comment: 22 pp., 2 fi
Global and Local Information in Clustering Labeled Block Models
The stochastic block model is a classical cluster-exhibiting random graph
model that has been widely studied in statistics, physics and computer science.
In its simplest form, the model is a random graph with two equal-sized
clusters, with intra-cluster edge probability p, and inter-cluster edge
probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is
practically more relevant and also mathematically more challenging. A
conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from
statistical physics, predicted a specific threshold for clustering. The
negative direction of the conjecture was proved by Mossel, Neeman and Sly
(2012), and more recently the positive direction was proven independently by
Massoulie and Mossel, Neeman, and Sly.
In many real network clustering problems, nodes contain information as well.
We study the interplay between node and network information in clustering by
studying a labeled block model, where in addition to the edge information, the
true cluster labels of a small fraction of the nodes are revealed. In the case
of two clusters, we show that below the threshold, a small amount of node
information does not affect recovery. On the other hand, we show that for any
small amount of information efficient local clustering is achievable as long as
the number of clusters is sufficiently large (as a function of the amount of
revealed information).Comment: 24 pages, 2 figures. A short abstract describing these results will
appear in proceedings of RANDOM 201