1,476 research outputs found
Stochastic fluctuations and the detectability limit of network communities
We have analyzed the detectability limits of network communities in the
framework of the popular Girvan and Newman benchmark. By carefully taking into
account the inevitable stochastic fluctuations that affect the construction of
each and every instance of the benchmark, we come to the conclusions that the
native, putative partition of the network is completely lost even before the
in-degree/out-degree ratio becomes equal to the one of a structure-less
Erd\"os-R\'enyi network. We develop a simple iterative scheme, analytically
well described by an infinite branching-process, to provide an estimate of the
true detectability limit. Using various algorithms based on modularity
optimization, we show that all of them behave (semi-quantitatively) in the same
way, with the same functional form of the detectability threshold as a function
of the network parameters. Because the same behavior has also been found by
further modularity-optimization methods and for methods based on different
heuristics implementations, we conclude that indeed a correct definition of the
detectability limit must take into account the stochastic fluctuations of the
network construction.Comment: 5 pages, 5 figures, correction of typos, improvement of the
bibliography and of the notation, general compressio
Spectral redemption: clustering sparse networks
Spectral algorithms are classic approaches to clustering and community
detection in networks. However, for sparse networks the standard versions of
these algorithms are suboptimal, in some cases completely failing to detect
communities even when other algorithms such as belief propagation can do so.
Here we introduce a new class of spectral algorithms based on a
non-backtracking walk on the directed edges of the graph. The spectrum of this
operator is much better-behaved than that of the adjacency matrix or other
commonly used matrices, maintaining a strong separation between the bulk
eigenvalues and the eigenvalues relevant to community structure even in the
sparse case. We show that our algorithm is optimal for graphs generated by the
stochastic block model, detecting communities all the way down to the
theoretical limit. We also show the spectrum of the non-backtracking operator
for some real-world networks, illustrating its advantages over traditional
spectral clustering.Comment: 11 pages, 6 figures. Clarified to what extent our claims are
rigorous, and to what extent they are conjectures; also added an
interpretation of the eigenvectors of the 2n-dimensional version of the
non-backtracking matri
Spatial correlations in attribute communities
Community detection is an important tool for exploring and classifying the
properties of large complex networks and should be of great help for spatial
networks. Indeed, in addition to their location, nodes in spatial networks can
have attributes such as the language for individuals, or any other
socio-economical feature that we would like to identify in communities. We
discuss in this paper a crucial aspect which was not considered in previous
studies which is the possible existence of correlations between space and
attributes. Introducing a simple toy model in which both space and node
attributes are considered, we discuss the effect of space-attribute
correlations on the results of various community detection methods proposed for
spatial networks in this paper and in previous studies. When space is
irrelevant, our model is equivalent to the stochastic block model which has
been shown to display a detectability-non detectability transition. In the
regime where space dominates the link formation process, most methods can fail
to recover the communities, an effect which is particularly marked when
space-attributes correlations are strong. In this latter case, community
detection methods which remove the spatial component of the network can miss a
large part of the community structure and can lead to incorrect results.Comment: 10 pages and 7 figure
Detecting communities using asymptotical Surprise
Nodes in real-world networks are repeatedly observed to form dense clusters,
often referred to as communities. Methods to detect these groups of nodes
usually maximize an objective function, which implicitly contains the
definition of a community. We here analyze a recently proposed measure called
surprise, which assesses the quality of the partition of a network into
communities. In its current form, the formulation of surprise is rather
difficult to analyze. We here therefore develop an accurate asymptotic
approximation. This allows for the development of an efficient algorithm for
optimizing surprise. Incidentally, this leads to a straightforward extension of
surprise to weighted graphs. Additionally, the approximation makes it possible
to analyze surprise more closely and compare it to other methods, especially
modularity. We show that surprise is (nearly) unaffected by the well known
resolution limit, a particular problem for modularity. However, surprise may
tend to overestimate the number of communities, whereas they may be
underestimated by modularity. In short, surprise works well in the limit of
many small communities, whereas modularity works better in the limit of few
large communities. In this sense, surprise is more discriminative than
modularity, and may find communities where modularity fails to discern any
structure
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
- …