Search CORE

21,214 research outputs found

Community detection thresholds and the weak Ramanujan property

Author: Krzakala F.
Nadakuditi R. R.
Publication venue
Publication date: 13/11/2013
Field of study

Decelle et al.\cite{Decelle11} conjectured the existence of a sharp threshold for community detection in sparse random graphs drawn from the stochastic block model. Mossel et al.\cite{Mossel12} established the negative part of the conjecture, proving impossibility of meaningful detection below the threshold. However the positive part of the conjecture remained elusive so far. Here we solve the positive part of the conjecture. We introduce a modified adjacency matrix

B

that counts self-avoiding paths of a given length

\ell

between pairs of nodes and prove that for logarithmic

\ell

, the leading eigenvectors of this modified matrix provide non-trivial detection, thereby settling the conjecture. A key step in the proof consists in establishing a {\em weak Ramanujan property} of matrix

B

. Namely, the spectrum of

B

consists in two leading eigenvalues

\rho(B)

\lambda_2

and

n-2

eigenvalues of a lower order

O(n^{\epsilon}\sqrt{\rho(B)})

for all

\epsilon>0

\rho(B)

denoting

B

's spectral radius.

d

-regular graphs are Ramanujan when their second eigenvalue verifies

|\lambda|\le 2 \sqrt{d-1}

. Random

d

-regular graphs have a second largest eigenvalue

\lambda

2\sqrt{d-1}+o(1)

(see Friedman\cite{friedman08}), thus being {\em almost} Ramanujan. Erd\H{o}s-R\'enyi graphs with average degree

d

at least logarithmic (

d=\Omega(\log n)

) have a second eigenvalue of

O(\sqrt{d})

(see Feige and Ofek\cite{Feige05}), a slightly weaker version of the Ramanujan property. However this spectrum separation property fails for sparse (

d=O(1)

) Erd\H{o}s-R\'enyi graphs. Our result thus shows that by constructing matrix

B

through neighborhood expansion, we regularize the original adjacency matrix to eventually recover a weak form of the Ramanujan property

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Global and Local Information in Clustering Labeled Block Models

Author: Kanade Varun
Mossel Elchanan
Schramm Tselil
Publication venue
Publication date: 01/01/2014
Field of study

The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).Comment: 24 pages, 2 figures. A short abstract describing these results will appear in proceedings of RANDOM 201

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Community detection and stochastic block models: recent developments

Author: Abbe Emmanuel
Publication venue
Publication date: 29/03/2017
Field of study

The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed

arXiv.org e-Print Archive