2,738 research outputs found

    Fast Detection of Community Structures using Graph Traversal in Social Networks

    Full text link
    Finding community structures in social networks is considered to be a challenging task as many of the proposed algorithms are computationally expensive and does not scale well for large graphs. Most of the community detection algorithms proposed till date are unsuitable for applications that would require detection of communities in real-time, especially for massive networks. The Louvain method, which uses modularity maximization to detect clusters, is usually considered to be one of the fastest community detection algorithms even without any provable bound on its running time. We propose a novel graph traversal-based community detection framework, which not only runs faster than the Louvain method but also generates clusters of better quality for most of the benchmark datasets. We show that our algorithms run in O(|V | + |E|) time to create an initial cover before using modularity maximization to get the final cover. Keywords - community detection; Influenced Neighbor Score; brokers; community nodes; communitiesComment: 29 pages, 9 tables, and 13 figures. Accepted in "Knowledge and Information Systems", 201

    Community Detection via Maximization of Modularity and Its Variants

    Full text link
    In this paper, we first discuss the definition of modularity (Q) used as a metric for community quality and then we review the modularity maximization approaches which were used for community detection in the last decade. Then, we discuss two opposite yet coexisting problems of modularity optimization: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones (so called the resolution limit problem). Next, we overview several community quality metrics proposed to solve the resolution limit problem and discuss Modularity Density (Qds) which simultaneously avoids the two problems of modularity. Finally, we introduce two novel fine-tuned community detection algorithms that iteratively attempt to improve the community quality measurements by splitting and merging the given network community structure. The first of them, referred to as Fine-tuned Q, is based on modularity (Q) while the second one is based on Modularity Density (Qds) and denoted as Fine-tuned Qds. Then, we compare the greedy algorithm of modularity maximization (denoted as Greedy Q), Fine-tuned Q, and Fine-tuned Qds on four real networks, and also on the classical clique network and the LFR benchmark networks, each of which is instantiated by a wide range of parameters. The results indicate that Fine-tuned Qds is the most effective among the three algorithms discussed. Moreover, we show that Fine-tuned Qds can be applied to the communities detected by other algorithms to significantly improve their results

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Enhancing community detection using a network weighting strategy

    Full text link
    A community within a network is a group of vertices densely connected to each other but less connected to the vertices outside. The problem of detecting communities in large networks plays a key role in a wide range of research areas, e.g. Computer Science, Biology and Sociology. Most of the existing algorithms to find communities count on the topological features of the network and often do not scale well on large, real-life instances. In this article we propose a strategy to enhance existing community detection algorithms by adding a pre-processing step in which edges are weighted according to their centrality w.r.t. the network topology. In our approach, the centrality of an edge reflects its contribute to making arbitrary graph tranversals, i.e., spreading messages over the network, as short as possible. Our strategy is able to effectively complements information about network topology and it can be used as an additional tool to enhance community detection. The computation of edge centralities is carried out by performing multiple random walks of bounded length on the network. Our method makes the computation of edge centralities feasible also on large-scale networks. It has been tested in conjunction with three state-of-the-art community detection algorithms, namely the Louvain method, COPRA and OSLOM. Experimental results show that our method raises the accuracy of existing algorithms both on synthetic and real-life datasets.Comment: 28 pages, 2 figure

    Router-level community structure of the Internet Autonomous Systems

    Get PDF
    The Internet is composed of routing devices connected between them and organized into independent administrative entities: the Autonomous Systems. The existence of different types of Autonomous Systems (like large connectivity providers, Internet Service Providers or universities) together with geographical and economical constraints, turns the Internet into a complex modular and hierarchical network. This organization is reflected in many properties of the Internet topology, like its high degree of clustering and its robustness. In this work, we study the modular structure of the Internet router-level graph in order to assess to what extent the Autonomous Systems satisfy some of the known notions of community structure. We show that the modular structure of the Internet is much richer than what can be captured by the current community detection methods, which are severely affected by resolution limits and by the heterogeneity of the Autonomous Systems. Here we overcome this issue by using a multiresolution detection algorithm combined with a small sample of nodes. We also discuss recent work on community structure in the light of our results
    • …
    corecore