122,474 research outputs found
Community Detection via Maximization of Modularity and Its Variants
In this paper, we first discuss the definition of modularity (Q) used as a
metric for community quality and then we review the modularity maximization
approaches which were used for community detection in the last decade. Then, we
discuss two opposite yet coexisting problems of modularity optimization: in
some cases, it tends to favor small communities over large ones while in
others, large communities over small ones (so called the resolution limit
problem). Next, we overview several community quality metrics proposed to solve
the resolution limit problem and discuss Modularity Density (Qds) which
simultaneously avoids the two problems of modularity. Finally, we introduce two
novel fine-tuned community detection algorithms that iteratively attempt to
improve the community quality measurements by splitting and merging the given
network community structure. The first of them, referred to as Fine-tuned Q, is
based on modularity (Q) while the second one is based on Modularity Density
(Qds) and denoted as Fine-tuned Qds. Then, we compare the greedy algorithm of
modularity maximization (denoted as Greedy Q), Fine-tuned Q, and Fine-tuned Qds
on four real networks, and also on the classical clique network and the LFR
benchmark networks, each of which is instantiated by a wide range of
parameters. The results indicate that Fine-tuned Qds is the most effective
among the three algorithms discussed. Moreover, we show that Fine-tuned Qds can
be applied to the communities detected by other algorithms to significantly
improve their results
Density-based rough set model for hesitant node clustering in overlapping community detection
Overlapping community detection in a network is a challenging issue which attracts lots of attention in recent years. A notion of hesitant node (HN) is proposed. An HN contacts with multiple communities while the communications are not strong or even accidental, thus the HN holds an implicit community structure. However, HNs are not rare in the real world network. It is important to identify them because they can be efficient hubs which form the overlapping portions of communities or simple attached nodes to some communities. Current approaches have difficulties in identifying and clustering HNs. A density-based rough set model (DBRSM) is proposed by combining the virtue of density-based algorithms and rough set models. It incorporates the macro perspective of the community structure of the whole network and the micro perspective of the local information held by HNs, which would facilitate the further 'growth' of HNs in community. We offer a theoretical support for this model from the point of strength of the trust path. The experiments on the real-world and synthetic datasets show the practical significance of analyzing and clustering the HNs based on DBRSM. Besides, the clustering based on DBRSM promotes the modularity optimization
Multiresolution community detection for megascale networks by information-based replica correlations
We use a Potts model community detection algorithm to accurately and
quantitatively evaluate the hierarchical or multiresolution structure of a
graph. Our multiresolution algorithm calculates correlations among multiple
copies ("replicas") of the same graph over a range of resolutions. Significant
multiresolution structures are identified by strongly correlated replicas. The
average normalized mutual information, the variation of information, and other
measures in principle give a quantitative estimate of the "best" resolutions
and indicate the relative strength of the structures in the graph. Because the
method is based on information comparisons, it can in principle be used with
any community detection model that can examine multiple resolutions. Our
approach may be extended to other optimization problems. As a local measure,
our Potts model avoids the "resolution limit" that affects other popular
models. With this model, our community detection algorithm has an accuracy that
ranks among the best of currently available methods. Using it, we can examine
graphs over 40 million nodes and more than one billion edges. We further report
that the multiresolution variant of our algorithm can solve systems of at least
200000 nodes and 10 million edges on a single processor with exceptionally high
accuracy. For typical cases, we find a super-linear scaling, O(L^{1.3}) for
community detection and O(L^{1.3} log N) for the multiresolution algorithm
where L is the number of edges and N is the number of nodes in the system.Comment: 19 pages, 14 figures, published version with minor change
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
- …