2,548 research outputs found
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Hierarchical community structure in networks
Modular and hierarchical structures are pervasive in real-world complex
systems. A great deal of effort has gone into trying to detect and study these
structures. Important theoretical advances in the detection of modular, or
"community", structures have included identifying fundamental limits of
detectability by formally defining community structure using probabilistic
generative models. Detecting hierarchical community structure introduces
additional challenges alongside those inherited from community detection. Here
we present a theoretical study on hierarchical community structure in networks,
which has thus far not received the same rigorous attention. We address the
following questions: 1)~How should we define a valid hierarchy of communities?
2)~How should we determine if a hierarchical structure exists in a network? and
3)~how can we detect hierarchical structure efficiently? We approach these
questions by introducing a definition of hierarchy based on the concept of
stochastic externally equitable partitions and their relation to probabilistic
models, such as the popular stochastic block model. We enumerate the challenges
involved in detecting hierarchies and, by studying the spectral properties of
hierarchical structure, present an efficient and principled method for
detecting them.Comment: 22 pages, 12 figure
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks
This paper presents a novel spectral algorithm with additive clustering
designed to identify overlapping communities in networks. The algorithm is
based on geometric properties of the spectrum of the expected adjacency matrix
in a random graph model that we call stochastic blockmodel with overlap (SBMO).
An adaptive version of the algorithm, that does not require the knowledge of
the number of hidden communities, is proved to be consistent under the SBMO
when the degrees in the graph are (slightly more than) logarithmic. The
algorithm is shown to perform well on simulated data and on real-world graphs
with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr
A generative model for reciprocity and community detection in networks
We present a probabilistic generative model and efficient algorithm to model
reciprocity in directed networks. Unlike other methods that address this
problem such as exponential random graphs, it assigns latent variables as
community memberships to nodes and a reciprocity parameter to the whole network
rather than fitting order statistics. It formalizes the assumption that a
directed interaction is more likely to occur if an individual has already
observed an interaction towards her. It provides a natural framework for
relaxing the common assumption in network generative models of conditional
independence between edges, and it can be used to perform inference tasks such
as predicting the existence of an edge given the observation of an edge in the
reverse direction. Inference is performed using an efficient
expectation-maximization algorithm that exploits the sparsity of the network,
leading to an efficient and scalable implementation. We illustrate these
findings by analyzing synthetic and real data, including social networks,
academic citations and the Erasmus student exchange program. Our method
outperforms others in both predicting edges and generating networks that
reflect the reciprocity values observed in real data, while at the same time
inferring an underlying community structure. We provide an open-source
implementation of the code online
- …