4,136 research outputs found

    Fast Detection of Community Structures using Graph Traversal in Social Networks

    Full text link
    Finding community structures in social networks is considered to be a challenging task as many of the proposed algorithms are computationally expensive and does not scale well for large graphs. Most of the community detection algorithms proposed till date are unsuitable for applications that would require detection of communities in real-time, especially for massive networks. The Louvain method, which uses modularity maximization to detect clusters, is usually considered to be one of the fastest community detection algorithms even without any provable bound on its running time. We propose a novel graph traversal-based community detection framework, which not only runs faster than the Louvain method but also generates clusters of better quality for most of the benchmark datasets. We show that our algorithms run in O(|V | + |E|) time to create an initial cover before using modularity maximization to get the final cover. Keywords - community detection; Influenced Neighbor Score; brokers; community nodes; communitiesComment: 29 pages, 9 tables, and 13 figures. Accepted in "Knowledge and Information Systems", 201

    Distributed Community Detection in Dynamic Graphs

    Full text link
    Inspired by the increasing interest in self-organizing social opportunistic networks, we investigate the problem of distributed detection of unknown communities in dynamic random graphs. As a formal framework, we consider the dynamic version of the well-studied \emph{Planted Bisection Model} \sdG(n,p,q) where the node set [n][n] of the network is partitioned into two unknown communities and, at every time step, each possible edge (u,v)(u,v) is active with probability pp if both nodes belong to the same community, while it is active with probability qq (with q<<pq<<p) otherwise. We also consider a time-Markovian generalization of this model. We propose a distributed protocol based on the popular \emph{Label Propagation Algorithm} and prove that, when the ratio p/qp/q is larger than nbn^{b} (for an arbitrarily small constant b>0b>0), the protocol finds the right "planted" partition in O(logn)O(\log n) time even when the snapshots of the dynamic graph are sparse and disconnected (i.e. in the case p=Θ(1/n)p=\Theta(1/n)).Comment: Version I

    Community detection and stochastic block models: recent developments

    Full text link
    The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed

    Global and Local Information in Clustering Labeled Block Models

    Get PDF
    The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).Comment: 24 pages, 2 figures. A short abstract describing these results will appear in proceedings of RANDOM 201
    corecore