75,356 research outputs found

    Consistency of community detection in networks under degree-corrected stochastic block models

    Full text link
    Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1036 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). With Correction

    A spectral method for community detection in moderately-sparse degree-corrected stochastic block models

    Full text link
    We consider community detection in Degree-Corrected Stochastic Block Models (DC-SBM). We propose a spectral clustering algorithm based on a suitably normalized adjacency matrix. We show that this algorithm consistently recovers the block-membership of all but a vanishing fraction of nodes, in the regime where the lowest degree is of order log(n)(n) or higher. Recovery succeeds even for very heterogeneous degree-distributions. The used algorithm does not rely on parameters as input. In particular, it does not need to know the number of communities

    Non-Backtracking Spectrum of Degree-Corrected Stochastic Block Models

    Full text link
    Motivated by community detection, we characterise the spectrum of the non-backtracking matrix BB in the Degree-Corrected Stochastic Block Model. Specifically, we consider a random graph on nn vertices partitioned into two equal-sized clusters. The vertices have i.i.d. weights {ϕu}u=1n\{ \phi_u \}_{u=1}^n with second moment Φ(2)\Phi^{(2)}. The intra-cluster connection probability for vertices uu and vv is ϕuϕvna\frac{\phi_u \phi_v}{n}a and the inter-cluster connection probability is ϕuϕvnb\frac{\phi_u \phi_v}{n}b. We show that with high probability, the following holds: The leading eigenvalue of the non-backtracking matrix BB is asymptotic to ρ=a+b2Φ(2)\rho = \frac{a+b}{2} \Phi^{(2)}. The second eigenvalue is asymptotic to μ2=ab2Φ(2)\mu_2 = \frac{a-b}{2} \Phi^{(2)} when μ22>ρ\mu_2^2 > \rho, but asymptotically bounded by ρ\sqrt{\rho} when μ22ρ\mu_2^2 \leq \rho. All the remaining eigenvalues are asymptotically bounded by ρ\sqrt{\rho}. As a result, a clustering positively-correlated with the true communities can be obtained based on the second eigenvector of BB in the regime where μ22>ρ.\mu_2^2 > \rho. In a previous work we obtained that detection is impossible when μ22<ρ,\mu_2^2 < \rho, meaning that there occurs a phase-transition in the sparse regime of the Degree-Corrected Stochastic Block Model. As a corollary, we obtain that Degree-Corrected Erd\H{o}s-R\'enyi graphs asymptotically satisfy the graph Riemann hypothesis, a quasi-Ramanujan property. A by-product of our proof is a weak law of large numbers for local-functionals on Degree-Corrected Stochastic Block Models, which could be of independent interest

    Joint Spectral Clustering in Multilayer Degree-Corrected Stochastic Blockmodels

    Full text link
    Modern network datasets are often composed of multiple layers, either as different views, time-varying observations, or independent sample units, resulting in collections of networks over the same set of vertices but with potentially different connectivity patterns on each network. These data require models and methods that are flexible enough to capture local and global differences across the networks, while at the same time being parsimonious and tractable to yield computationally efficient and theoretically sound solutions that are capable of aggregating information across the networks. This paper considers the multilayer degree-corrected stochastic blockmodel, where a collection of networks share the same community structure, but degree-corrections and block connection probability matrices are permitted to be different. We establish the identifiability of this model and propose a spectral clustering algorithm for community detection in this setting. Our theoretical results demonstrate that the misclustering error rate of the algorithm improves exponentially with multiple network realizations, even in the presence of significant layer heterogeneity with respect to degree corrections, signal strength, and spectral properties of the block connection probability matrices. Simulation studies show that this approach improves on existing multilayer community detection methods in this challenging regime. Furthermore, in a case study of US airport data through January 2016 -- September 2021, we find that this methodology identifies meaningful community structure and trends in airport popularity influenced by pandemic impacts on travel

    Bayesian stochastic blockmodels for community detection in networks and community-structured covariance selection

    Full text link
    Networks have been widely used to describe interactions among objects in diverse fields. Given the interest in explaining a network by its structure, much attention has been drawn to finding clusters of nodes with dense connections within clusters but sparse connections between clusters. Such clusters are called communities, and identifying such clusters is known as community detection. Here, to perform community detection, I focus on stochastic blockmodels (SBM), a class of statistically-based generative models. I present a flexible SBM that represents different types of data as well as node attributes under a Bayesian framework. The proposed models explicitly capture community behavior by guaranteeing that connections are denser within communities than between communities. First, I present a degree-corrected SBM based on a logistic regression formulation to model binary networks. To fit the model, I obtain posterior samples via Gibbs sampling based on Polya-Gamma latent variables. I conduct inference based on a novel, canonically mapped centroid estimator that formally addresses label non-identifiability and captures representative community assignments. Next, to accommodate large-scale datasets, I further extend the degree-corrected SBM to a broader family of generalized linear models with group correction terms. To conduct exact inference efficiently, I develop an iteratively-reweighted least squares procedure that implicitly updates sufficient statistics on the network to obtain maximum a posteriori (MAP) estimators. I demonstrate the proposed model and estimation on simulated benchmark networks and various real-world datasets. Finally, I develop a Bayesian SBM for community-structured covariance selection. Here, I assume that the data at each node are Gaussian and a latent network where two nodes are not connected if their observations are conditionally independent given observations of other nodes. Under the context of biological and social applications, I expect that this latent network shows a block dependency structure that represents community behavior. Thus, to identify the latent network and detect communities, I propose a hierarchical prior in two levels: a spike-and-slab prior on off-diagonal entries of the concentration matrix for variable selection and a degree-corrected SBM to capture community behavior. I develop an efficient routine based on ridge regularization and MAP estimation to conduct inference

    Community Detection and Classification Guarantees Using Embeddings Learned by Node2Vec

    Full text link
    Embedding the nodes of a large network into an Euclidean space is a common objective in modern machine learning, with a variety of tools available. These embeddings can then be used as features for tasks such as community detection/node clustering or link prediction, where they achieve state of the art performance. With the exception of spectral clustering methods, there is little theoretical understanding for other commonly used approaches to learning embeddings. In this work we examine the theoretical properties of the embeddings learned by node2vec. Our main result shows that the use of k-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data
    corecore