36,824 research outputs found
Spectral Thresholds in the Bipartite Stochastic Block Model
We consider a bipartite stochastic block model on vertex sets and
, with planted partitions in each, and ask at what densities efficient
algorithms can recover the partition of the smaller vertex set.
When , multiple thresholds emerge. We first locate a sharp
threshold for detection of the partition, in the sense of the results of
\cite{mossel2012stochastic,mossel2013proof} and \cite{massoulie2014community}
for the stochastic block model. We then show that at a higher edge density, the
singular vectors of the rectangular biadjacency matrix exhibit a localization /
delocalization phase transition, giving recovery above the threshold and no
recovery below. Nevertheless, we propose a simple spectral algorithm, Diagonal
Deletion SVD, which recovers the partition at a nearly optimal edge density.
The bipartite stochastic block model studied here was used by
\cite{feldman2014algorithm} to give a unified algorithm for recovering planted
partitions and assignments in random hypergraphs and random -SAT formulae
respectively. Our results give the best known bounds for the clause density at
which solutions can be found efficiently in these models as well as showing a
barrier to further improvement via this reduction to the bipartite block model.Comment: updated version, will appear in COLT 201
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Online Tensor Methods for Learning Latent Variable Models
We introduce an online tensor decomposition based approach for two latent
variable modeling problems namely, (1) community detection, in which we learn
the latent communities that the social actors in social networks belong to, and
(2) topic modeling, in which we infer hidden topics of text articles. We
consider decomposition of moment tensors using stochastic gradient descent. We
conduct optimization of multilinear operations in SGD and avoid directly
forming the tensors, to save computational and storage costs. We present
optimized algorithm in two platforms. Our GPU-based implementation exploits the
parallelism of SIMD architectures to allow for maximum speed-up by a careful
optimization of storage and data transfer, whereas our CPU-based implementation
uses efficient sparse matrix computations and is suitable for large sparse
datasets. For the community detection problem, we demonstrate accuracy and
computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic
modeling problem, we also demonstrate good performance on the New York Times
dataset. We compare our results to the state-of-the-art algorithms such as the
variational method, and report a gain of accuracy and a gain of several orders
of magnitude in the execution time.Comment: JMLR 201
Compressing networks with super nodes
Community detection is a commonly used technique for identifying groups in a
network based on similarities in connectivity patterns. To facilitate community
detection in large networks, we recast the network to be partitioned into a
smaller network of 'super nodes', each super node comprising one or more nodes
in the original network. To define the seeds of our super nodes, we apply the
'CoreHD' ranking from dismantling and decycling. We test our approach through
the analysis of two common methods for community detection: modularity
maximization with the Louvain algorithm and maximum likelihood optimization for
fitting a stochastic block model. Our results highlight that applying community
detection to the compressed network of super nodes is significantly faster
while successfully producing partitions that are more aligned with the local
network connectivity, more stable across multiple (stochastic) runs within and
between community detection algorithms, and overlap well with the results
obtained using the full network
Efficient method for estimating the number of communities in a network
While there exist a wide range of effective methods for community detection
in networks, most of them require one to know in advance how many communities
one is looking for. Here we present a method for estimating the number of
communities in a network using a combination of Bayesian inference with a novel
prior and an efficient Monte Carlo sampling scheme. We test the method
extensively on both real and computer-generated networks, showing that it
performs accurately and consistently, even in cases where groups are widely
varying in size or structure.Comment: 13 pages, 4 figure
- β¦