95 research outputs found
k-clique Communities in the Internet AS-level
A signicant challenge for researchers analysing the Internet AS-level topology graph is how to interpret the global organization of the graph as the coexistence of its structural blocks (communities) associated with more highly interconnected parts. While a huge number of papers have already been published on the issue of community detection, very little attention has so far been devoted to the discovery and interpretation of Internet communities at the various levels of abstractions. We believe that by discovering and interpreting a priori these unknown building blocks (i.e. communities), this will then pave the way for new types of analysis which are crucial in understanding of the structural and functional properties of the Internet at least at the AS level of abstraction. We thus propose a novel type of analysis of the Internet AS-level topology graph by exploiting the k-clique community denition. First, we show that detected communities can be described by a tree representation. Then we show the presence of two classes of k-clique communities: those that are strictly aected by the nesting process which is embedded in the k-clique community denition, and, on the other hand, those that appear as branches in the tree. We conclude our analysis by highlighting the properties that characterize k-clique communities with dierent k values by exploiting both geographical data and information related to IXPs
Uncovering the overlapping community structure of complex networks in nature and society
Many complex systems in nature and society can be described in terms of
networks capturing the intricate web of connections among the units they are
made of. A key question is how to interpret the global organization of such
networks as the coexistence of their structural subunits (communities)
associated with more highly interconnected parts. Identifying these a priori
unknown building blocks (such as functionally related proteins, industrial
sectors and groups of people) is crucial to the understanding of the structural
and functional properties of networks. The existing deterministic methods used
for large networks find separated communities, whereas most of the actual
networks are made of highly overlapping cohesive groups of nodes. Here we
introduce an approach to analysing the main statistical features of the
interwoven sets of overlapping communities that makes a step towards uncovering
the modular structure of complex systems. After defining a set of new
characteristic quantities for the statistics of communities, we apply an
efficient technique for exploring overlapping communities on a large scale. We
find that overlaps are significant, and the distributions we introduce reveal
universal features of networks. Our studies of collaboration, word-association
and protein interaction graphs show that the web of communities has non-trivial
correlations and specific scaling properties.Comment: The free academic research software, CFinder, used for the
publication is available at the website of the publication:
http://angel.elte.hu/clusterin
Parallel -Clique Community Detection on Large-Scale Networks
The analysis of real-world complex networks has been the focus of recent research. Detecting communities helps in uncovering their structural and functional organization. Valuable insight can be obtained by analyzing the dense, overlapping, and highly interwoven k-clique communities. However, their detection is challenging due to extensive memory requirements and execution time. In this paper, we present a novel, parallel k-clique community detection method, based on an innovative technique which enables connected components of a network to be obtained from those of its subnetworks. The novel method has an unbounded, user-configurable, and input-independent maximum degree of parallelism, and hence is able to make full use of computational resources. Theoretical tight upper bounds on its worst case time and space complexities are given as well. Experiments on real-world networks such as the Internet and the World Wide Web confirmed the almost optimal use of parallelism (i.e., a linear speedup). Comparisons with other state-of-the-art k-clique community detection methods show dramatic reductions in execution time and memory footprint. An open-source implementation of the method is also made publicly available
Community structure and ethnic preferences in school friendship networks
Recently developed concepts and techniques of analyzing complex systems
provide new insight into the structure of social networks. Uncovering recurrent
preferences and organizational principles in such networks is a key issue to
characterize them. We investigate school friendship networks from the Add
Health database. Applying threshold analysis, we find that the friendship
networks do not form a single connected component through mutual strong
nominations within a school, while under weaker conditions such
interconnectedness is present. We extract the networks of overlapping
communities at the schools (c-networks) and find that they are scale free and
disassortative in contrast to the direct friendship networks, which have an
exponential degree distribution and are assortative. Based on the network
analysis we study the ethnic preferences in friendship selection. The clique
percolation method we use reveals that when in minority, the students tend to
build more densely interconnected groups of friends. We also find an asymmetry
in the behavior of black minorities in a white majority as compared to that of
white minorities in a black majority.Comment: submitted to Physica
A systematic comparison of genome-scale clustering algorithms
Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted
Beyond topological persistence: Starting from networks
Persistent homology enables fast and computable comparison of topological
objects. However, it is naturally limited to the analysis of topological
spaces. We extend the theory of persistence, by guaranteeing robustness and
computability to significant data types as simple graphs and quivers. We focus
on categorical persistence functions that allow us to study in full generality
strong kinds of connectedness such as clique communities, -vertex and
-edge connectedness directly on simple graphs and monic coherent categories.Comment: arXiv admin note: text overlap with arXiv:1707.0967
Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods
<p>Abstract</p> <p>Background</p> <p>Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.</p> <p>Results</p> <p>In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.</p> <p>Conclusion</p> <p>Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p
- …