Search CORE

2,253 research outputs found

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

Author: A Anandkumar
B Ball
B Karrer
E Airoldi
G Palla
J Xie
K Rohe
ME Newman
P Latouche
PW Holland
RN Shepard
S Chatterjee
S Zhang
U Luxburg Von
Y Zhao
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Community detection and stochastic block models: recent developments

Author: Abbe Emmanuel
Publication venue
Publication date: 29/03/2017
Field of study

The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed

arXiv.org e-Print Archive

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel

Author: Krzakala Florent
Lesieur Thibault
Zdeborová Lenka
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/10/2015
Field of study

This paper considers probabilistic estimation of a low-rank matrix from non-linear element-wise measurements of its elements. We derive the corresponding approximate message passing (AMP) algorithm and its state evolution. Relying on non-rigorous but standard assumptions motivated by statistical physics, we characterize the minimum mean squared error (MMSE) achievable information theoretically and with the AMP algorithm. Unlike in related problems of linear estimation, in the present setting the MMSE depends on the output channel only trough a single parameter - its Fisher information. We illustrate this striking finding by analysis of submatrix localization, and of detection of communities hidden in a dense stochastic block model. For this example we locate the computational and statistical boundaries that are not equal for rank larger than four.Comment: 10 pages, Allerton Conference on Communication, Control, and Computing 201

arXiv.org e-Print Archive

Crossref

HAL-CEA

Community detection in overlapping weighted networks

Author: Qing Huan
Publication venue
Publication date: 02/11/2022
Field of study

Community detection in overlapping unweighted networks in which nodes can belong to multiple communities is one of the most popular topics in modern network science during the last decade. However, community detection in overlapping weighted networks in which elements of adjacency matrices can be any finite real values remains a challenge. In this article, we propose a degree-corrected mixed membership distribution-free (DCMMDF) model which extends the degree-corrected mixed membership model from overlapping unweighted networks to overlapping weighted networks. We address the community membership estimation of the DCMMDF by an application of a spectral algorithm and establish a theoretical guarantee of estimation consistency. The proposed model is applied to simulated data and real-world data

arXiv.org e-Print Archive

$\mathbb{T}$ -Stochastic Graphs

Author: Fang Sijia
Rohe Karl
Publication venue
Publication date: 03/09/2023
Field of study

Previous statistical approaches to hierarchical clustering for social network analysis all construct an "ultrametric" hierarchy. While the assumption of ultrametricity has been discussed and studied in the phylogenetics literature, it has not yet been acknowledged in the social network literature. We show that "non-ultrametric structure" in the network introduces significant instabilities in the existing top-down recovery algorithms. To address this issue, we introduce an instability diagnostic plot and use it to examine a collection of empirical networks. These networks appear to violate the "ultrametric" assumption. We propose a deceptively simple and yet general class of probabilistic models called

\mathbb{T}

-Stochastic Graphs which impose no topological restrictions on the latent hierarchy. To illustrate this model, we propose six alternative forms of hierarchical network models and then show that all six are equivalent to the

\mathbb{T}

-Stochastic Graph model. These alternative models motivate a novel approach to hierarchical clustering that combines spectral techniques with the well-known Neighbor-Joining algorithm from phylogenetic reconstruction. We prove this spectral approach is statistically consistent

arXiv.org e-Print Archive