215 research outputs found
Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities
Many complex networks display a mesoscopic structure with groups of nodes
sharing many links with the other nodes in their group and comparatively few
with nodes of different groups. This feature is known as community structure
and encodes precious information about the organization and the function of the
nodes. Many algorithms have been proposed but it is not yet clear how they
should be tested. Recently we have proposed a general class of undirected and
unweighted benchmark graphs, with heterogenous distributions of node degree and
community size. An increasing attention has been recently devoted to develop
algorithms able to consider the direction and the weight of the links, which
require suitable benchmark graphs for testing. In this paper we extend the
basic ideas behind our previous benchmark to generate directed and weighted
networks with built-in community structure. We also consider the possibility
that nodes belong to more communities, a feature occurring in real systems,
like, e. g., social networks. As a practical application, we show how
modularity optimization performs on our new benchmark.Comment: 9 pages, 13 figures. Final version published in Physical Review E.
The code to create the benchmark graphs can be freely downloaded from
http://santo.fortunato.googlepages.com/inthepress
Finding local community structure in networks
Although the inference of global community structure in networks has recently
become a topic of great interest in the physics community, all such algorithms
require that the graph be completely known. Here, we define both a measure of
local community structure and an algorithm that infers the hierarchy of
communities that enclose a given vertex by exploring the graph one vertex at a
time. This algorithm runs in time O(d*k^2) for general graphs when is the
mean degree and k is the number of vertices to be explored. For graphs where
exploring a new vertex is time-consuming, the running time is linear, O(k). We
show that on computer-generated graphs this technique compares favorably to
algorithms that require global knowledge. We also use this algorithm to extract
meaningful local clustering information in the large recommender network of an
online retailer and show the existence of mesoscopic structure.Comment: 7 pages, 6 figure
Finding community structure in very large networks
The discovery and analysis of community structure in networks is a topic of
considerable recent interest within the physics community, but most methods
proposed so far are unsuitable for very large networks because of their
computational cost. Here we present a hierarchical agglomeration algorithm for
detecting community structure which is faster than many competing algorithms:
its running time on a network with n vertices and m edges is O(m d log n) where
d is the depth of the dendrogram describing the community structure. Many
real-world networks are sparse and hierarchical, with m ~ n and d ~ log n, in
which case our algorithm runs in essentially linear time, O(n log^2 n). As an
example of the application of this algorithm we use it to analyze a network of
items for sale on the web-site of a large online retailer, items in the network
being linked if they are frequently purchased by the same buyer. The network
has more than 400,000 vertices and 2 million edges. We show that our algorithm
can extract meaningful communities from this network, revealing large-scale
patterns present in the purchasing habits of customers
Identifying network communities with a high resolution
Community structure is an important property of complex networks. An
automatic discovery of such structure is a fundamental task in many
disciplines, including sociology, biology, engineering, and computer science.
Recently, several community discovery algorithms have been proposed based on
the optimization of a quantity called modularity (Q). However, the problem of
modularity optimization is NP-hard, and the existing approaches often suffer
from prohibitively long running time or poor quality. Furthermore, it has been
recently pointed out that algorithms based on optimizing Q will have a
resolution limit, i.e., communities below a certain scale may not be detected.
In this research, we first propose an efficient heuristic algorithm, Qcut,
which combines spectral graph partitioning and local search to optimize Q.
Using both synthetic and real networks, we show that Qcut can find higher
modularities and is more scalable than the existing algorithms. Furthermore,
using Qcut as an essential component, we propose a recursive algorithm, HQcut,
to solve the resolution limit problem. We show that HQcut can successfully
detect communities at a much finer scale and with a higher accuracy than the
existing algorithms. Finally, we apply Qcut and HQcut to study a
protein-protein interaction network, and show that the combination of the two
algorithms can reveal interesting biological results that may be otherwise
undetectable.Comment: 14 pages, 5 figures. 1 supplemental file at
http://cic.cs.wustl.edu/qcut/supplemental.pd
Benchmark graphs for testing community detection algorithms
Community structure is one of the most important features of real networks
and reveals the internal organization of the nodes. Many algorithms have been
proposed but the crucial issue of testing, i.e. the question of how good an
algorithm is, with respect to others, is still open. Standard tests include the
analysis of simple artificial graphs with a built-in community structure, that
the algorithm has to recover. However, the special graphs adopted in actual
tests have a structure that does not reflect the real properties of nodes and
communities found in real networks. Here we introduce a new class of benchmark
graphs, that account for the heterogeneity in the distributions of node degrees
and of community sizes. We use this new benchmark to test two popular methods
of community detection, modularity optimization and Potts model clustering. The
results show that the new benchmark poses a much more severe test to algorithms
than standard benchmarks, revealing limits that may not be apparent at a first
analysis.Comment: 6 pages, 8 figures. Extended version published on Physical Review E.
The code to build the new benchmark graphs can be downloaded from
http://santo.fortunato.googlepages.com/inthepress
Coexistence of opposite opinions in a network with communities
The Majority Rule is applied to a topology that consists of two coupled
random networks, thereby mimicking the modular structure observed in social
networks. We calculate analytically the asymptotic behaviour of the model and
derive a phase diagram that depends on the frequency of random opinion flips
and on the inter-connectivity between the two communities. It is shown that
three regimes may take place: a disordered regime, where no collective
phenomena takes place; a symmetric regime, where the nodes in both communities
reach the same average opinion; an asymmetric regime, where the nodes in each
community reach an opposite average opinion. The transition from the asymmetric
regime to the symmetric regime is shown to be discontinuous.Comment: 14 pages, 4 figure
An evolving network model with community structure
Many social and biological networks consist of communities—groups of nodes within which connections are dense, but between which connections are sparser. Recently, there has been considerable interest in designing algorithms for detecting community structures in real-world complex networks. In this paper, we propose an evolving network model which exhibits community structure. The network model is based on the inner-community preferential attachment and inter-community preferential attachment mechanisms. The degree distributions of this network model are analysed based on a mean-field method. Theoretical results and numerical simulations indicate that this network model has community structure and scale-free properties
Finding community structure in networks using the eigenvectors of matrices
We consider the problem of detecting communities or modules in networks,
groups of vertices with a higher-than-average density of edges connecting them.
Previous work indicates that a robust approach to this problem is the
maximization of the benefit function known as "modularity" over possible
divisions of a network. Here we show that this maximization process can be
written in terms of the eigenspectrum of a matrix we call the modularity
matrix, which plays a role in community detection similar to that played by the
graph Laplacian in graph partitioning calculations. This result leads us to a
number of possible algorithms for detecting community structure, as well as
several other results, including a spectral measure of bipartite structure in
networks and a new centrality measure that identifies those vertices that
occupy central positions within the communities to which they belong. The
algorithms and measures proposed are illustrated with applications to a variety
of real-world complex networks.Comment: 22 pages, 8 figures, minor corrections in this versio
Local Causal States and Discrete Coherent Structures
Coherent structures form spontaneously in nonlinear spatiotemporal systems
and are found at all spatial scales in natural phenomena from laboratory
hydrodynamic flows and chemical reactions to ocean, atmosphere, and planetary
climate dynamics. Phenomenologically, they appear as key components that
organize the macroscopic behaviors in such systems. Despite a century of
effort, they have eluded rigorous analysis and empirical prediction, with
progress being made only recently. As a step in this, we present a formal
theory of coherent structures in fully-discrete dynamical field theories. It
builds on the notion of structure introduced by computational mechanics,
generalizing it to a local spatiotemporal setting. The analysis' main tool
employs the \localstates, which are used to uncover a system's hidden
spatiotemporal symmetries and which identify coherent structures as
spatially-localized deviations from those symmetries. The approach is
behavior-driven in the sense that it does not rely on directly analyzing
spatiotemporal equations of motion, rather it considers only the spatiotemporal
fields a system generates. As such, it offers an unsupervised approach to
discover and describe coherent structures. We illustrate the approach by
analyzing coherent structures generated by elementary cellular automata,
comparing the results with an earlier, dynamic-invariant-set approach that
decomposes fields into domains, particles, and particle interactions.Comment: 27 pages, 10 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/dcs.ht
Maximal planar networks with large clustering coefficient and power-law degree distribution
In this article, we propose a simple rule that generates scale-free networks
with very large clustering coefficient and very small average distance. These
networks are called {\bf Random Apollonian Networks}(RAN) as they can be
considered as a variation of Apollonian networks. We obtain the analytic
results of power-law exponent and clustering coefficient
, which agree very well with the
simulation results. We prove that the increasing tendency of average distance
of RAN is a little slower than the logarithm of the number of nodes in RAN.
Since most real-life networks are both scale-free and small-world networks, RAN
may perform well in mimicking the reality. The RAN possess hierarchical
structure as that in accord with the observations of many
real-life networks. In addition, we prove that RAN are maximal planar networks,
which are of particular practicability for layout of printed circuits and so
on. The percolation and epidemic spreading process are also studies and the
comparison between RAN and Barab\'{a}si-Albert(BA) as well as Newman-Watts(NW)
networks are shown. We find that, when the network order (the total number
of nodes) is relatively small(as ), the performance of RAN under
intentional attack is not sensitive to , while that of BA networks is much
affected by . And the diseases spread slower in RAN than BA networks during
the outbreaks, indicating that the large clustering coefficient may slower the
spreading velocity especially in the outbreaks.Comment: 13 pages, 10 figure
- …