2,050 research outputs found
Community Detection via Maximization of Modularity and Its Variants
In this paper, we first discuss the definition of modularity (Q) used as a
metric for community quality and then we review the modularity maximization
approaches which were used for community detection in the last decade. Then, we
discuss two opposite yet coexisting problems of modularity optimization: in
some cases, it tends to favor small communities over large ones while in
others, large communities over small ones (so called the resolution limit
problem). Next, we overview several community quality metrics proposed to solve
the resolution limit problem and discuss Modularity Density (Qds) which
simultaneously avoids the two problems of modularity. Finally, we introduce two
novel fine-tuned community detection algorithms that iteratively attempt to
improve the community quality measurements by splitting and merging the given
network community structure. The first of them, referred to as Fine-tuned Q, is
based on modularity (Q) while the second one is based on Modularity Density
(Qds) and denoted as Fine-tuned Qds. Then, we compare the greedy algorithm of
modularity maximization (denoted as Greedy Q), Fine-tuned Q, and Fine-tuned Qds
on four real networks, and also on the classical clique network and the LFR
benchmark networks, each of which is instantiated by a wide range of
parameters. The results indicate that Fine-tuned Qds is the most effective
among the three algorithms discussed. Moreover, we show that Fine-tuned Qds can
be applied to the communities detected by other algorithms to significantly
improve their results
Fast Detection of Community Structures using Graph Traversal in Social Networks
Finding community structures in social networks is considered to be a
challenging task as many of the proposed algorithms are computationally
expensive and does not scale well for large graphs. Most of the community
detection algorithms proposed till date are unsuitable for applications that
would require detection of communities in real-time, especially for massive
networks. The Louvain method, which uses modularity maximization to detect
clusters, is usually considered to be one of the fastest community detection
algorithms even without any provable bound on its running time. We propose a
novel graph traversal-based community detection framework, which not only runs
faster than the Louvain method but also generates clusters of better quality
for most of the benchmark datasets. We show that our algorithms run in O(|V | +
|E|) time to create an initial cover before using modularity maximization to
get the final cover.
Keywords - community detection; Influenced Neighbor Score; brokers; community
nodes; communitiesComment: 29 pages, 9 tables, and 13 figures. Accepted in "Knowledge and
Information Systems", 201
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach
This paper proposes an organized generalization of Newman and Girvan's
modularity measure for graph clustering. Optimized via a deterministic
annealing scheme, this measure produces topologically ordered graph clusterings
that lead to faithful and readable graph representations based on clustering
induced graphs. Topographic graph clustering provides an alternative to more
classical solutions in which a standard graph clustering method is applied to
build a simpler graph that is then represented with a graph layout algorithm. A
comparative study on four real world graphs ranging from 34 to 1 133 vertices
shows the interest of the proposed approach with respect to classical solutions
and to self-organizing maps for graphs
Community detection in temporal multilayer networks, with an application to correlation networks
Networks are a convenient way to represent complex systems of interacting
entities. Many networks contain "communities" of nodes that are more densely
connected to each other than to nodes in the rest of the network. In this
paper, we investigate the detection of communities in temporal networks
represented as multilayer networks. As a focal example, we study time-dependent
financial-asset correlation networks. We first argue that the use of the
"modularity" quality function---which is defined by comparing edge weights in
an observed network to expected edge weights in a "null network"---is
application-dependent. We differentiate between "null networks" and "null
models" in our discussion of modularity maximization, and we highlight that the
same null network can correspond to different null models. We then investigate
a multilayer modularity-maximization problem to identify communities in
temporal networks. Our multilayer analysis only depends on the form of the
maximization problem and not on the specific quality function that one chooses.
We introduce a diagnostic to measure \emph{persistence} of community structure
in a multilayer network partition. We prove several results that describe how
the multilayer maximization problem measures a trade-off between static
community structure within layers and larger values of persistence across
layers. We also discuss some computational issues that the popular "Louvain"
heuristic faces with temporal multilayer networks and suggest ways to mitigate
them.Comment: 42 pages, many figures, final accepted version before typesettin
- …