95 research outputs found

    k-clique Communities in the Internet AS-level

    Get PDF
    A signicant challenge for researchers analysing the Internet AS-level topology graph is how to interpret the global organization of the graph as the coexistence of its structural blocks (communities) associated with more highly interconnected parts. While a huge number of papers have already been published on the issue of community detection, very little attention has so far been devoted to the discovery and interpretation of Internet communities at the various levels of abstractions. We believe that by discovering and interpreting a priori these unknown building blocks (i.e. communities), this will then pave the way for new types of analysis which are crucial in understanding of the structural and functional properties of the Internet at least at the AS level of abstraction. We thus propose a novel type of analysis of the Internet AS-level topology graph by exploiting the k-clique community denition. First, we show that detected communities can be described by a tree representation. Then we show the presence of two classes of k-clique communities: those that are strictly aected by the nesting process which is embedded in the k-clique community denition, and, on the other hand, those that appear as branches in the tree. We conclude our analysis by highlighting the properties that characterize k-clique communities with dierent k values by exploiting both geographical data and information related to IXPs

    Uncovering the overlapping community structure of complex networks in nature and society

    Full text link
    Many complex systems in nature and society can be described in terms of networks capturing the intricate web of connections among the units they are made of. A key question is how to interpret the global organization of such networks as the coexistence of their structural subunits (communities) associated with more highly interconnected parts. Identifying these a priori unknown building blocks (such as functionally related proteins, industrial sectors and groups of people) is crucial to the understanding of the structural and functional properties of networks. The existing deterministic methods used for large networks find separated communities, whereas most of the actual networks are made of highly overlapping cohesive groups of nodes. Here we introduce an approach to analysing the main statistical features of the interwoven sets of overlapping communities that makes a step towards uncovering the modular structure of complex systems. After defining a set of new characteristic quantities for the statistics of communities, we apply an efficient technique for exploring overlapping communities on a large scale. We find that overlaps are significant, and the distributions we introduce reveal universal features of networks. Our studies of collaboration, word-association and protein interaction graphs show that the web of communities has non-trivial correlations and specific scaling properties.Comment: The free academic research software, CFinder, used for the publication is available at the website of the publication: http://angel.elte.hu/clusterin

    Parallel (k)(k)-Clique Community Detection on Large-Scale Networks

    Get PDF
    The analysis of real-world complex networks has been the focus of recent research. Detecting communities helps in uncovering their structural and functional organization. Valuable insight can be obtained by analyzing the dense, overlapping, and highly interwoven k-clique communities. However, their detection is challenging due to extensive memory requirements and execution time. In this paper, we present a novel, parallel k-clique community detection method, based on an innovative technique which enables connected components of a network to be obtained from those of its subnetworks. The novel method has an unbounded, user-configurable, and input-independent maximum degree of parallelism, and hence is able to make full use of computational resources. Theoretical tight upper bounds on its worst case time and space complexities are given as well. Experiments on real-world networks such as the Internet and the World Wide Web confirmed the almost optimal use of parallelism (i.e., a linear speedup). Comparisons with other state-of-the-art k-clique community detection methods show dramatic reductions in execution time and memory footprint. An open-source implementation of the method is also made publicly available

    Community structure and ethnic preferences in school friendship networks

    Get PDF
    Recently developed concepts and techniques of analyzing complex systems provide new insight into the structure of social networks. Uncovering recurrent preferences and organizational principles in such networks is a key issue to characterize them. We investigate school friendship networks from the Add Health database. Applying threshold analysis, we find that the friendship networks do not form a single connected component through mutual strong nominations within a school, while under weaker conditions such interconnectedness is present. We extract the networks of overlapping communities at the schools (c-networks) and find that they are scale free and disassortative in contrast to the direct friendship networks, which have an exponential degree distribution and are assortative. Based on the network analysis we study the ethnic preferences in friendship selection. The clique percolation method we use reveals that when in minority, the students tend to build more densely interconnected groups of friends. We also find an asymmetry in the behavior of black minorities in a white majority as compared to that of white minorities in a black majority.Comment: submitted to Physica

    A systematic comparison of genome-scale clustering algorithms

    Get PDF
    Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted

    Beyond topological persistence: Starting from networks

    Full text link
    Persistent homology enables fast and computable comparison of topological objects. However, it is naturally limited to the analysis of topological spaces. We extend the theory of persistence, by guaranteeing robustness and computability to significant data types as simple graphs and quivers. We focus on categorical persistence functions that allow us to study in full generality strong kinds of connectedness such as clique communities, kk-vertex and kk-edge connectedness directly on simple graphs and monic coherent categories.Comment: arXiv admin note: text overlap with arXiv:1707.0967

    Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.</p> <p>Results</p> <p>In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.</p> <p>Conclusion</p> <p>Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p
    corecore