4,937 research outputs found
Statistical Mechanics of Community Detection
Starting from a general \textit{ansatz}, we show how community detection can
be interpreted as finding the ground state of an infinite range spin glass. Our
approach applies to weighted and directed networks alike. It contains the
\textit{at hoc} introduced quality function from \cite{ReichardtPRL} and the
modularity as defined by Newman and Girvan \cite{Girvan03} as special
cases. The community structure of the network is interpreted as the spin
configuration that minimizes the energy of the spin glass with the spin states
being the community indices. We elucidate the properties of the ground state
configuration to give a concise definition of communities as cohesive subgroups
in networks that is adaptive to the specific class of network under study.
Further we show, how hierarchies and overlap in the community structure can be
detected. Computationally effective local update rules for optimization
procedures to find the ground state are given. We show how the \textit{ansatz}
may be used to discover the community around a given node without detecting all
communities in the full network and we give benchmarks for the performance of
this extension. Finally, we give expectation values for the modularity of
random graphs, which can be used in the assessment of statistical significance
of community structure
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
Topological Feature Based Classification
There has been a lot of interest in developing algorithms to extract clusters
or communities from networks. This work proposes a method, based on
blockmodelling, for leveraging communities and other topological features for
use in a predictive classification task. Motivated by the issues faced by the
field of community detection and inspired by recent advances in Bayesian topic
modelling, the presented model automatically discovers topological features
relevant to a given classification task. In this way, rather than attempting to
identify some universal best set of clusters for an undefined goal, the aim is
to find the best set of clusters for a particular purpose.
Using this method, topological features can be validated and assessed within
a given context by their predictive performance.
The proposed model differs from other relational and semi-supervised learning
models as it identifies topological features to explain the classification
decision. In a demonstration on a number of real networks the predictive
capability of the topological features are shown to rival the performance of
content based relational learners. Additionally, the model is shown to
outperform graph-based semi-supervised methods on directed and approximately
bipartite networks.Comment: Awarded 3rd Best Student Paper at 14th International Conference on
Information Fusion 201
- …