42 research outputs found

    On the Permanence of Vertices in Network Communities

    Full text link
    Despite the prevalence of community detection algorithms, relatively less work has been done on understanding whether a network is indeed modular and how resilient the community structure is under perturbations. To address this issue, we propose a new vertex-based metric called "permanence", that can quantitatively give an estimate of the community-like structure of the network. The central idea of permanence is based on the observation that the strength of membership of a vertex to a community depends upon the following two factors: (i) the distribution of external connectivity of the vertex to individual communities and not the total external connectivity, and (ii) the strength of its internal connectivity and not just the total internal edges. In this paper, we demonstrate that compared to other metrics, permanence provides (i) a more accurate estimate of a derived community structure to the ground-truth community and (ii) is more sensitive to perturbations in the network. As a by-product of this study, we have also developed a community detection algorithm based on maximizing permanence. For a modular network structure, the results of our algorithm match well with ground-truth communities.Comment: 10 pages, 5 figures, 8 tables, Accepted in 20th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    A Parallel Graph Sampling Algorithm for Analyzing Gene Correlation Networks

    Get PDF
    AbstractEffcient analysis of complex networks is often a challenging task due to its large size and the noise inherent in the system. One popular method of overcoming this problem is through graph sampling, that is extracting a representative subgraph from the larger network. The accuracy of the sample is validated by comparing the combinatorial properties of the subgraph and the original network. However, there has been little study in comparing networks based on the applications that they represent. Furthermore, sampling methods are generally applied agnostically, without mapping to the requirements of the underlying analysis.In this paper,we introduce a parallel graph sampling algorithm focusing on gene correlation networks. Densely connected subgraphs indicate important functional units of gene products. In our sampling algorithm, we emphasize maintaining highly connected regions of the network through parallel sampling based on extracting the maximal chordal subgraph of the network. We validate our methods by comparing both combinatorial properties and functional units of the subgraphs and larger networks. Our results show that even with significant reduction of the network (on average 20% to 40%), we obtain reliable samplings and many of the relevant combinatorial and functional properties are retained in the subgraphs

    A Parallel Non-Alignment Based Approach to Efficient Sequence Comparison using Longest Common Subsequences

    Get PDF
    Biological sequence comparison programs have revolutionized the practice of biochemistry, and molecular and evolutionary biology. Pairwise comparison of genomic sequences is a popular method of choice for analyzing genetic sequence data. However the quality of results from most sequence comparison methods are significantly affected by small perturbations in the data and furthermore, there is a dearth of computational tools to compare sequences beyond a certain length. In this paper, we describe a parallel algorithm for comparing genetic sequences using an alignment free-method based on computing the Longest Common Subsequence (LCS) between genetic sequences. We validate the quality of our results by comparing the phylogenetic tress obtained from ClustalW and LCS. We also show through complexity analysis of the isoefficiency and by empirical measurement of the running time that our algorithm is very scalable

    A noise reducing sampling approach for uncovering critical properties in large scale biological networks

    Get PDF
    A correlation network is a graph-based representation of relationships among genes or gene products, such as proteins. The advent of high-throughput bioinformatics has resulted in the generation of volumes of data that require sophisticated in silico models, such as the correlation network, for in-depth analysis. Each element in our network represents expression levels of multiple samples of one gene and an edge connecting two nodes reflects the correlation level between the two corresponding genes in the network according to the Pearson correlation coefficient. Biological networks made in this manner are generally found to adhere to a scale-free structural nature, that is, it is modular and adheres to a power-law degree distribution. Filtering these structures to remove noise and coincidental edges in the network is a necessity for network theorists because unfortunately, when examining entire genomes at once, network size and complexity can act as a bottleneck for network manageability. Our previous work demonstrated that chordal graph based sampling of network results in viable models. In this paper, we extend our research to investigate how different orderings affect the results of our sampling, and maintain the viability of resulting network structures. Our results show that chordal graph based sampling not only conserves clusters that are present within the original networks, but by reducing noise can also help uncover additional functional clusters that were previously not obtainable from the original network
    corecore