309 research outputs found
Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities
Many complex networks display a mesoscopic structure with groups of nodes
sharing many links with the other nodes in their group and comparatively few
with nodes of different groups. This feature is known as community structure
and encodes precious information about the organization and the function of the
nodes. Many algorithms have been proposed but it is not yet clear how they
should be tested. Recently we have proposed a general class of undirected and
unweighted benchmark graphs, with heterogenous distributions of node degree and
community size. An increasing attention has been recently devoted to develop
algorithms able to consider the direction and the weight of the links, which
require suitable benchmark graphs for testing. In this paper we extend the
basic ideas behind our previous benchmark to generate directed and weighted
networks with built-in community structure. We also consider the possibility
that nodes belong to more communities, a feature occurring in real systems,
like, e. g., social networks. As a practical application, we show how
modularity optimization performs on our new benchmark.Comment: 9 pages, 13 figures. Final version published in Physical Review E.
The code to create the benchmark graphs can be freely downloaded from
http://santo.fortunato.googlepages.com/inthepress
Finding local community structure in networks
Although the inference of global community structure in networks has recently
become a topic of great interest in the physics community, all such algorithms
require that the graph be completely known. Here, we define both a measure of
local community structure and an algorithm that infers the hierarchy of
communities that enclose a given vertex by exploring the graph one vertex at a
time. This algorithm runs in time O(d*k^2) for general graphs when is the
mean degree and k is the number of vertices to be explored. For graphs where
exploring a new vertex is time-consuming, the running time is linear, O(k). We
show that on computer-generated graphs this technique compares favorably to
algorithms that require global knowledge. We also use this algorithm to extract
meaningful local clustering information in the large recommender network of an
online retailer and show the existence of mesoscopic structure.Comment: 7 pages, 6 figure
Finding community structure in very large networks
The discovery and analysis of community structure in networks is a topic of
considerable recent interest within the physics community, but most methods
proposed so far are unsuitable for very large networks because of their
computational cost. Here we present a hierarchical agglomeration algorithm for
detecting community structure which is faster than many competing algorithms:
its running time on a network with n vertices and m edges is O(m d log n) where
d is the depth of the dendrogram describing the community structure. Many
real-world networks are sparse and hierarchical, with m ~ n and d ~ log n, in
which case our algorithm runs in essentially linear time, O(n log^2 n). As an
example of the application of this algorithm we use it to analyze a network of
items for sale on the web-site of a large online retailer, items in the network
being linked if they are frequently purchased by the same buyer. The network
has more than 400,000 vertices and 2 million edges. We show that our algorithm
can extract meaningful communities from this network, revealing large-scale
patterns present in the purchasing habits of customers
Identifying network communities with a high resolution
Community structure is an important property of complex networks. An
automatic discovery of such structure is a fundamental task in many
disciplines, including sociology, biology, engineering, and computer science.
Recently, several community discovery algorithms have been proposed based on
the optimization of a quantity called modularity (Q). However, the problem of
modularity optimization is NP-hard, and the existing approaches often suffer
from prohibitively long running time or poor quality. Furthermore, it has been
recently pointed out that algorithms based on optimizing Q will have a
resolution limit, i.e., communities below a certain scale may not be detected.
In this research, we first propose an efficient heuristic algorithm, Qcut,
which combines spectral graph partitioning and local search to optimize Q.
Using both synthetic and real networks, we show that Qcut can find higher
modularities and is more scalable than the existing algorithms. Furthermore,
using Qcut as an essential component, we propose a recursive algorithm, HQcut,
to solve the resolution limit problem. We show that HQcut can successfully
detect communities at a much finer scale and with a higher accuracy than the
existing algorithms. Finally, we apply Qcut and HQcut to study a
protein-protein interaction network, and show that the combination of the two
algorithms can reveal interesting biological results that may be otherwise
undetectable.Comment: 14 pages, 5 figures. 1 supplemental file at
http://cic.cs.wustl.edu/qcut/supplemental.pd
Local Causal States and Discrete Coherent Structures
Coherent structures form spontaneously in nonlinear spatiotemporal systems
and are found at all spatial scales in natural phenomena from laboratory
hydrodynamic flows and chemical reactions to ocean, atmosphere, and planetary
climate dynamics. Phenomenologically, they appear as key components that
organize the macroscopic behaviors in such systems. Despite a century of
effort, they have eluded rigorous analysis and empirical prediction, with
progress being made only recently. As a step in this, we present a formal
theory of coherent structures in fully-discrete dynamical field theories. It
builds on the notion of structure introduced by computational mechanics,
generalizing it to a local spatiotemporal setting. The analysis' main tool
employs the \localstates, which are used to uncover a system's hidden
spatiotemporal symmetries and which identify coherent structures as
spatially-localized deviations from those symmetries. The approach is
behavior-driven in the sense that it does not rely on directly analyzing
spatiotemporal equations of motion, rather it considers only the spatiotemporal
fields a system generates. As such, it offers an unsupervised approach to
discover and describe coherent structures. We illustrate the approach by
analyzing coherent structures generated by elementary cellular automata,
comparing the results with an earlier, dynamic-invariant-set approach that
decomposes fields into domains, particles, and particle interactions.Comment: 27 pages, 10 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/dcs.ht
Coexistence of opposite opinions in a network with communities
The Majority Rule is applied to a topology that consists of two coupled
random networks, thereby mimicking the modular structure observed in social
networks. We calculate analytically the asymptotic behaviour of the model and
derive a phase diagram that depends on the frequency of random opinion flips
and on the inter-connectivity between the two communities. It is shown that
three regimes may take place: a disordered regime, where no collective
phenomena takes place; a symmetric regime, where the nodes in both communities
reach the same average opinion; an asymmetric regime, where the nodes in each
community reach an opposite average opinion. The transition from the asymmetric
regime to the symmetric regime is shown to be discontinuous.Comment: 14 pages, 4 figure
Maximal planar networks with large clustering coefficient and power-law degree distribution
In this article, we propose a simple rule that generates scale-free networks
with very large clustering coefficient and very small average distance. These
networks are called {\bf Random Apollonian Networks}(RAN) as they can be
considered as a variation of Apollonian networks. We obtain the analytic
results of power-law exponent and clustering coefficient
, which agree very well with the
simulation results. We prove that the increasing tendency of average distance
of RAN is a little slower than the logarithm of the number of nodes in RAN.
Since most real-life networks are both scale-free and small-world networks, RAN
may perform well in mimicking the reality. The RAN possess hierarchical
structure as that in accord with the observations of many
real-life networks. In addition, we prove that RAN are maximal planar networks,
which are of particular practicability for layout of printed circuits and so
on. The percolation and epidemic spreading process are also studies and the
comparison between RAN and Barab\'{a}si-Albert(BA) as well as Newman-Watts(NW)
networks are shown. We find that, when the network order (the total number
of nodes) is relatively small(as ), the performance of RAN under
intentional attack is not sensitive to , while that of BA networks is much
affected by . And the diseases spread slower in RAN than BA networks during
the outbreaks, indicating that the large clustering coefficient may slower the
spreading velocity especially in the outbreaks.Comment: 13 pages, 10 figure
Finding and evaluating community structure in networks
We propose and study a set of algorithms for discovering community structure
in networks -- natural divisions of network nodes into densely connected
subgroups. Our algorithms all share two definitive features: first, they
involve iterative removal of edges from the network to split it into
communities, the edges removed being identified using one of a number of
possible "betweenness" measures, and second, these measures are, crucially,
recalculated after each removal. We also propose a measure for the strength of
the community structure found by our algorithms, which gives us an objective
metric for choosing the number of communities into which a network should be
divided. We demonstrate that our algorithms are highly effective at discovering
community structure in both computer-generated and real-world network data, and
show how they can be used to shed light on the sometimes dauntingly complex
structure of networked systems.Comment: 16 pages, 13 figure
Finding community structure in networks using the eigenvectors of matrices
We consider the problem of detecting communities or modules in networks,
groups of vertices with a higher-than-average density of edges connecting them.
Previous work indicates that a robust approach to this problem is the
maximization of the benefit function known as "modularity" over possible
divisions of a network. Here we show that this maximization process can be
written in terms of the eigenspectrum of a matrix we call the modularity
matrix, which plays a role in community detection similar to that played by the
graph Laplacian in graph partitioning calculations. This result leads us to a
number of possible algorithms for detecting community structure, as well as
several other results, including a spectral measure of bipartite structure in
networks and a new centrality measure that identifies those vertices that
occupy central positions within the communities to which they belong. The
algorithms and measures proposed are illustrated with applications to a variety
of real-world complex networks.Comment: 22 pages, 8 figures, minor corrections in this versio
Comparing community structure identification
We compare recent approaches to community structure identification in terms
of sensitivity and computational cost. The recently proposed modularity measure
is revisited and the performance of the methods as applied to ad hoc networks
with known community structure, is compared. We find that the most accurate
methods tend to be more computationally expensive, and that both aspects need
to be considered when choosing a method for practical purposes. The work is
intended as an introduction as well as a proposal for a standard benchmark test
of community detection methods.Comment: 10 pages, 3 figures, 1 table. v2: condensed, updated version as
appears in JSTA
- …
