4,937 research outputs found

    Statistical Mechanics of Community Detection

    Full text link
    Starting from a general \textit{ansatz}, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass. Our approach applies to weighted and directed networks alike. It contains the \textit{at hoc} introduced quality function from \cite{ReichardtPRL} and the modularity QQ as defined by Newman and Girvan \cite{Girvan03} as special cases. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further we show, how hierarchies and overlap in the community structure can be detected. Computationally effective local update rules for optimization procedures to find the ground state are given. We show how the \textit{ansatz} may be used to discover the community around a given node without detecting all communities in the full network and we give benchmarks for the performance of this extension. Finally, we give expectation values for the modularity of random graphs, which can be used in the assessment of statistical significance of community structure

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Bayesian stochastic blockmodeling

    Full text link
    This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool at https://graph-tool.skewed.de . See also the HOWTO at https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm

    Topological Feature Based Classification

    Full text link
    There has been a lot of interest in developing algorithms to extract clusters or communities from networks. This work proposes a method, based on blockmodelling, for leveraging communities and other topological features for use in a predictive classification task. Motivated by the issues faced by the field of community detection and inspired by recent advances in Bayesian topic modelling, the presented model automatically discovers topological features relevant to a given classification task. In this way, rather than attempting to identify some universal best set of clusters for an undefined goal, the aim is to find the best set of clusters for a particular purpose. Using this method, topological features can be validated and assessed within a given context by their predictive performance. The proposed model differs from other relational and semi-supervised learning models as it identifies topological features to explain the classification decision. In a demonstration on a number of real networks the predictive capability of the topological features are shown to rival the performance of content based relational learners. Additionally, the model is shown to outperform graph-based semi-supervised methods on directed and approximately bipartite networks.Comment: Awarded 3rd Best Student Paper at 14th International Conference on Information Fusion 201
    • …
    corecore