2,253 research outputs found
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks
This paper presents a novel spectral algorithm with additive clustering
designed to identify overlapping communities in networks. The algorithm is
based on geometric properties of the spectrum of the expected adjacency matrix
in a random graph model that we call stochastic blockmodel with overlap (SBMO).
An adaptive version of the algorithm, that does not require the knowledge of
the number of hidden communities, is proved to be consistent under the SBMO
when the degrees in the graph are (slightly more than) logarithmic. The
algorithm is shown to perform well on simulated data and on real-world graphs
with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel
This paper considers probabilistic estimation of a low-rank matrix from
non-linear element-wise measurements of its elements. We derive the
corresponding approximate message passing (AMP) algorithm and its state
evolution. Relying on non-rigorous but standard assumptions motivated by
statistical physics, we characterize the minimum mean squared error (MMSE)
achievable information theoretically and with the AMP algorithm. Unlike in
related problems of linear estimation, in the present setting the MMSE depends
on the output channel only trough a single parameter - its Fisher information.
We illustrate this striking finding by analysis of submatrix localization, and
of detection of communities hidden in a dense stochastic block model. For this
example we locate the computational and statistical boundaries that are not
equal for rank larger than four.Comment: 10 pages, Allerton Conference on Communication, Control, and
Computing 201
Community detection in overlapping weighted networks
Community detection in overlapping unweighted networks in which nodes can
belong to multiple communities is one of the most popular topics in modern
network science during the last decade. However, community detection in
overlapping weighted networks in which elements of adjacency matrices can be
any finite real values remains a challenge. In this article, we propose a
degree-corrected mixed membership distribution-free (DCMMDF) model which
extends the degree-corrected mixed membership model from overlapping unweighted
networks to overlapping weighted networks. We address the community membership
estimation of the DCMMDF by an application of a spectral algorithm and
establish a theoretical guarantee of estimation consistency. The proposed model
is applied to simulated data and real-world data
-Stochastic Graphs
Previous statistical approaches to hierarchical clustering for social network
analysis all construct an "ultrametric" hierarchy. While the assumption of
ultrametricity has been discussed and studied in the phylogenetics literature,
it has not yet been acknowledged in the social network literature. We show that
"non-ultrametric structure" in the network introduces significant instabilities
in the existing top-down recovery algorithms. To address this issue, we
introduce an instability diagnostic plot and use it to examine a collection of
empirical networks. These networks appear to violate the "ultrametric"
assumption. We propose a deceptively simple and yet general class of
probabilistic models called -Stochastic Graphs which impose no
topological restrictions on the latent hierarchy. To illustrate this model, we
propose six alternative forms of hierarchical network models and then show that
all six are equivalent to the -Stochastic Graph model. These
alternative models motivate a novel approach to hierarchical clustering that
combines spectral techniques with the well-known Neighbor-Joining algorithm
from phylogenetic reconstruction. We prove this spectral approach is
statistically consistent
- …