133 research outputs found
Detecting highly overlapping community structure by greedy clique expansion
In complex networks it is common for each node to belong to several
communities, implying a highly overlapping community structure. Recent advances
in benchmarking indicate that existing community assignment algorithms that are
capable of detecting overlapping communities perform well only when the extent
of community overlap is kept to modest levels. To overcome this limitation, we
introduce a new community assignment algorithm called Greedy Clique Expansion
(GCE). The algorithm identifies distinct cliques as seeds and expands these
seeds by greedily optimizing a local fitness function. We perform extensive
benchmarks on synthetic data to demonstrate that GCE's good performance is
robust across diverse graph topologies. Significantly, GCE is the only
algorithm to perform well on these synthetic graphs, in which every node
belongs to multiple communities. Furthermore, when put to the task of
identifying functional modules in protein interaction data, and college dorm
assignments in Facebook friendship data, we find that GCE performs
competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at
http://sites.google.com/site/greedycliqueexpansion
Node-Centric Detection of Overlapping Communities in Social Networks
We present NECTAR, a community detection algorithm that generalizes Louvain
method's local search heuristic for overlapping community structures. NECTAR
chooses dynamically which objective function to optimize based on the network
on which it is invoked. Our experimental evaluation on both synthetic benchmark
graphs and real-world networks, based on ground-truth communities, shows that
NECTAR provides excellent results as compared with state of the art community
detection algorithms
On Mining Biological Signals Using Correlation Networks
Correlation networks have been used in biological networks to analyze and model high-throughput biological data, such as gene expression from microarray or RNA-seq assays. Typically in biological network modeling, structures can be mined from these networks that represent biological functions; for example, a cluster of proteins in an interactome can represent a protein complex. In correlation networks built from high-throughput gene expression data, it has often been speculated or even assumed that clusters represent sets of genes that are coregulated. This research aims to validate this concept using network systems biology and data mining by identification of correlation network clusters via multiple clustering approaches and cross-validation of regulatory elements in these clusters via motif finding software. The results show that the majority (81- 100%) of genes in any given cluster will share at least one predicted transcription factor binding site. With this in mind, new regulatory relationships can be proposed using known transcription factors and their binding sites by integrating regulatory information and the network model itself
Inferring modules from human protein interactome classes
<p>Abstract</p> <p>Background</p> <p>The integration of protein-protein interaction networks derived from high-throughput screening approaches and complementary sources is a key topic in systems biology. Although integration of protein interaction data is conventionally performed, the effects of this procedure on the result of network analyses has not been examined yet. In particular, in order to optimize the fusion of heterogeneous interaction datasets, it is crucial to consider not only their degree of coverage and accuracy, but also their mutual dependencies and additional salient features.</p> <p>Results</p> <p>We examined this issue based on the analysis of modules detected by network clustering methods applied to both integrated and individual (disaggregated) data sources, which we call interactome classes. Due to class diversity, we deal with variable dependencies of data features arising from structural specificities and biases, but also from possible overlaps. Since highly connected regions of the human interactome may point to potential protein complexes, we have focused on the concept of modularity, and elucidated the detection power of module extraction algorithms by independent validations based on GO, MIPS and KEGG. From the combination of protein interactions with gene expressions, a confidence scoring scheme has been proposed before proceeding via GO with further classification in permanent and transient modules.</p> <p>Conclusions</p> <p>Disaggregated interactomes are shown to be informative for inferring modularity, thus contributing to perform an effective integrative analysis. Validation of the extracted modules by multiple annotation allows for the assessment of confidence measures assigned to the modules in a protein pathway context. Notably, the proposed multilayer confidence scheme can be used for network calibration by enabling a transition from unweighted to weighted interactomes based on biological evidence.</p
A systematic comparison of genome-scale clustering algorithms
Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted
Finding overlapping communities in networks by label propagation
We propose an algorithm for finding overlapping community structure in very
large networks. The algorithm is based on the label propagation technique of
Raghavan, Albert, and Kumara, but is able to detect communities that overlap.
Like the original algorithm, vertices have labels that propagate between
neighbouring vertices so that members of a community reach a consensus on their
community membership. Our main contribution is to extend the label and
propagation step to include information about more than one community: each
vertex can now belong to up to v communities, where v is the parameter of the
algorithm. Our algorithm can also handle weighted and bipartite networks. Tests
on an independently designed set of benchmarks, and on real networks, show the
algorithm to be highly effective in recovering overlapping communities. It is
also very fast and can process very large and dense networks in a short time
Overlapping Community Detection in Networks: the State of the Art and Comparative Study
This paper reviews the state of the art in overlapping community detection
algorithms, quality measures, and benchmarks. A thorough comparison of
different algorithms (a total of fourteen) is provided. In addition to
community level evaluation, we propose a framework for evaluating algorithms'
ability to detect overlapping nodes, which helps to assess over-detection and
under-detection. After considering community level detection performance
measured by Normalized Mutual Information, the Omega index, and node level
detection performance measured by F-score, we reached the following
conclusions. For low overlapping density networks, SLPA, OSLOM, Game and COPRA
offer better performance than the other tested algorithms. For networks with
high overlapping density and high overlapping diversity, both SLPA and Game
provide relatively stable performance. However, test results also suggest that
the detection in such networks is still not yet fully resolved. A common
feature observed by various algorithms in real-world networks is the relatively
small fraction of overlapping nodes (typically less than 30%), each of which
belongs to only 2 or 3 communities.Comment: This paper (final version) is accepted in 2012. ACM Computing
Surveys, vol. 45, no. 4, 2013 (In press) Contact: [email protected]
- …