250,290 research outputs found
Hierarchical mutual information for the comparison of hierarchical community structures in complex networks
The quest for a quantitative characterization of community and modular
structure of complex networks produced a variety of methods and algorithms to
classify different networks. However, it is not clear if such methods provide
consistent, robust and meaningful results when considering hierarchies as a
whole. Part of the problem is the lack of a similarity measure for the
comparison of hierarchical community structures. In this work we give a
contribution by introducing the {\it hierarchical mutual information}, which is
a generalization of the traditional mutual information, and allows to compare
hierarchical partitions and hierarchical community structures. The {\it
normalized} version of the hierarchical mutual information should behave
analogously to the traditional normalized mutual information. Here, the correct
behavior of the hierarchical mutual information is corroborated on an extensive
battery of numerical experiments. The experiments are performed on artificial
hierarchies, and on the hierarchical community structure of artificial and
empirical networks. Furthermore, the experiments illustrate some of the
practical applications of the hierarchical mutual information. Namely, the
comparison of different community detection methods, and the study of the
consistency, robustness and temporal evolution of the hierarchical modular
structure of networks.Comment: 14 pages and 12 figure
Local multiresolution order in community detection
Community detection algorithms attempt to find the best clusters of nodes in
an arbitrary complex network. Multi-scale ("multiresolution") community
detection extends the problem to identify the best network scale(s) for these
clusters. The latter task is generally accomplished by analyzing community
stability simultaneously for all clusters in the network. In the current work,
we extend this general approach to define local multiresolution methods, which
enable the extraction of well-defined local communities even if the global
community structure is vaguely defined in an average sense. Toward this end, we
propose measures analogous to variation of information and normalized mutual
information that are used to quantitatively identify the best resolution(s) at
the community level based on correlations between clusters in
independently-solved systems. We demonstrate our method on two constructed
networks as well as a real network and draw inferences about local community
strength. Our approach is independent of the applied community detection
algorithm save for the inherent requirement that the method be able to identify
communities across different network scales, with appropriate changes to
account for how different resolutions are evaluated or defined in a particular
community detection method. It should, in principle, easily adapt to
alternative community comparison measures.Comment: 19 pages, 11 figure
How is a data-driven approach better than random choice in label space division for multi-label classification?
We propose using five data-driven community detection approaches from social
networks to partition the label space for the task of multi-label
classification as an alternative to random partitioning into equal subsets as
performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector,
infomap, walktrap and label propagation algorithms. We construct a label
co-occurence graph (both weighted an unweighted versions) based on training
data and perform community detection to partition the label set. We include
Binary Relevance and Label Powerset classification methods for comparison. We
use gini-index based Decision Trees as the base classifier. We compare educated
approaches to label space divisions against random baselines on 12 benchmark
data sets over five evaluation measures. We show that in almost all cases seven
educated guess approaches are more likely to outperform RAkELd than otherwise
in all measures, but Hamming Loss. We show that fastgreedy and walktrap
community detection methods on weighted label co-occurence graphs are 85-92%
more likely to yield better F1 scores than random partitioning. Infomap on the
unweighted label co-occurence graphs is on average 90% of the times better than
random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard
similarity. Weighted fastgreedy is better on average than RAkELd when it comes
to Hamming Loss
Markov dynamics as a zooming lens for multiscale community detection: non clique-like communities and the field-of-view limit
In recent years, there has been a surge of interest in community detection
algorithms for complex networks. A variety of computational heuristics, some
with a long history, have been proposed for the identification of communities
or, alternatively, of good graph partitions. In most cases, the algorithms
maximize a particular objective function, thereby finding the `right' split
into communities. Although a thorough comparison of algorithms is still
lacking, there has been an effort to design benchmarks, i.e., random graph
models with known community structure against which algorithms can be
evaluated. However, popular community detection methods and benchmarks normally
assume an implicit notion of community based on clique-like subgraphs, a form
of community structure that is not always characteristic of real networks.
Specifically, networks that emerge from geometric constraints can have natural
non clique-like substructures with large effective diameters, which can be
interpreted as long-range communities. In this work, we show that long-range
communities escape detection by popular methods, which are blinded by a
restricted `field-of-view' limit, an intrinsic upper scale on the communities
they can detect. The field-of-view limit means that long-range communities tend
to be overpartitioned. We show how by adopting a dynamical perspective towards
community detection (Delvenne et al. (2010) PNAS:107: 12755-12760; Lambiotte et
al. (2008) arXiv:0812.1770), in which the evolution of a Markov process on the
graph is used as a zooming lens over the structure of the network at all
scales, one can detect both clique- or non clique-like communities without
imposing an upper scale to the detection. Consequently, the performance of
algorithms on inherently low-diameter, clique-like benchmarks may not always be
indicative of equally good results in real networks with local, sparser
connectivity.Comment: 20 pages, 6 figure
Finding network communities using modularity density
Many real-world complex networks exhibit a community structure, in which the modules correspond to actual functional units. Identifying these communities is a key challenge for scientists. A common approach is to search for the network partition that maximizes a quality function. Here, we present a detailed analysis of a recently proposed function, namely modularity density. We show that it does not incur in the drawbacks suffered by traditional modularity, and that it can identify networks without ground-truth community structure, deriving its analytical dependence on link density in generic random graphs. In addition, we show that modularity density allows an easy comparison between networks of different sizes, and we also present some limitations that methods based on modularity density may suffer from. Finally, we introduce an efficient, quadratic community detection algorithm based on modularity density maximization, validating its accuracy against theoretical predictions and on a set of benchmark networks
Benchmark model to assess community structure in evolving networks
Detecting the time evolution of the community structure of networks is
crucial to identify major changes in the internal organization of many complex
systems, which may undergo important endogenous or exogenous events. This
analysis can be done in two ways: considering each snapshot as an independent
community detection problem or taking into account the whole evolution of the
network. In the first case, one can apply static methods on the temporal
snapshots, which correspond to configurations of the system in short time
windows, and match afterwards the communities across layers. Alternatively, one
can develop dedicated dynamic procedures, so that multiple snapshots are
simultaneously taken into account while detecting communities, which allows us
to keep memory of the flow. To check how well a method of any kind could
capture the evolution of communities, suitable benchmarks are needed. Here we
propose a model for generating simple dynamic benchmark graphs, based on
stochastic block models. In them, the time evolution consists of a periodic
oscillation of the system's structure between configurations with built-in
community structure. We also propose the extension of quality comparison
indices to the dynamic scenario.Comment: 11 pages, 7 figures, 3 table
Finding network communities using modularity density
This is the author's accepted manuscript. The final published version is available from IOP Publishing via the DOI in this recordMany real-world complex networks exhibit a community structure, in which the modules correspond to actual functional units. Identifying these communities is a key challenge for scientists. A common approach is to search for the network partition that maximizes a quality function. Here, we present a detailed analysis of a recently proposed function, namely modularity density. We show that it does not incur in the drawbacks suffered by traditional modularity, and that it can identify networks without ground-truth community structure, deriving its analytical dependence on link density in generic random graphs. In addition, we show that modularity density allows an easy comparison between networks of different sizes, and we also present some limitations that methods based on modularity density may suffer from. Finally, we introduce an efficient, quadratic community detection algorithm based on modularity density maximization, validating its accuracy against theoretical predictions and on a set of benchmark networks.Engineering and Physical Sciences Research Council (EPSRC
An unsupervised disease module identification technique in biological networks using novel quality metric based on connectivity, conductance and modularity
Disease processes are usually driven by several genes interacting in molecular modules or pathways leading to the disease. The identification of such modules in gene or protein networks is the core of computational methods in biomedical research. With this pretext, the Disease Module Identification (DMI) DREAM Challenge was initiated as an effort to systematically assess module identification methods on a panel of 6 diverse genomic networks. In this paper, we propose a generic refinement method based on ideas of merging and splitting the hierarchical tree obtained from any community detection technique for constrained DMI in biological networks. The only constraint was that size of community is in the range [3, 100]. We propose a novel model evaluation metric, called F-score, computed from several unsupervised quality metrics like modularity, conductance and connectivity to determine the quality of a graph partition at given level of hierarchy. We also propose a quality measure, namely Inverse Confidence, which ranks and prune insignificant modules to obtain a curated list of candidate disease modules (DM) for biological network. The predicted modules are evaluated on the basis of the total number of unique candidate modules that are associated with complex traits and diseases from over 200 genome-wide association study (GWAS) datasets. During the competition, we identified 42 modules, ranking 15th at the official false detection rate (FDR) cut-off of 0.05 for identifying statistically significant DM in the 6 benchmark networks. However, for stringent FDR cut-offs 0.025 and 0.01, the proposed method identified 31 (rank 9) and 16 DMIs (rank 10) respectively. From additional analysis, our proposed approach detected a total of 44 DM in the networks in comparison to 60 for the winner of DREAM Challenge. Interestingly, for several individual benchmark networks, our performance was better or competitive with the winner
- …