146 research outputs found

    Biological network comparison using graphlet degree distribution

    Full text link
    Analogous to biological sequence comparison, comparing cellular networks is an important problem that could provide insight into biological understanding and therapeutics. For technical reasons, comparing large networks is computationally infeasible, and thus heuristics such as the degree distribution have been sought. It is easy to demonstrate that two networks are different by simply showing a short list of properties in which they differ. It is much harder to show that two networks are similar, as it requires demonstrating their similarity in all of their exponentially many properties. Clearly, it is computationally prohibitive to analyze all network properties, but the larger the number of constraints we impose in determining network similarity, the more likely it is that the networks will truly be similar. We introduce a new systematic measure of a network's local structure that imposes a large number of similarity constraints on networks being compared. In particular, we generalize the degree distribution, which measures the number of nodes 'touching' k edges, into distributions measuring the number of nodes 'touching' k graphlets, where graphlets are small connected non-isomorphic subgraphs of a large network. Our new measure of network local structure consists of 73 graphlet degree distributions (GDDs) of graphlets with 2-5 nodes, but it is easily extendible to a greater number of constraints (i.e. graphlets). Furthermore, we show a way to combine the 73 GDDs into a network 'agreement' measure. Based on this new network agreement measure, we show that almost all of the 14 eukaryotic PPI networks, including human, are better modeled by geometric random graphs than by Erdos-Reny, random scale-free, or Barabasi-Albert scale-free networks.Comment: Proceedings of the 2006 European Conference on Computational Biology, ECCB'06, Eilat, Israel, January 21-24, 200

    Topological network alignment uncovers biological function and phylogeny

    Full text link
    Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

    Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8.

    Get PDF
    Graphlets are small connected induced subgraphs of a larger graph G. Graphlets are now commonly used to quantify local and global topology of networks in the field. Methods exist to exhaustively enumerate all graphlets (and their orbits) in large networks as efficiently as possible using orbit counting equations. However, the number of graphlets in G is exponential in both the number of nodes and edges in G. Enumerating them all is already unacceptably expensive on existing large networks, and the problem will only get worse as networks continue to grow in size and density. Here we introduce an efficient method designed to aid statistical sampling of graphlets up to size k = 8 from a large network. We define graphettes as the generalization of graphlets allowing for disconnected graphlets. Given a particular (undirected) graphette g, we introduce the idea of the canonical graphette [Formula: see text] as a representative member of the isomorphism group Iso(g) of g. We compute the mapping [Formula: see text], in the form of a lookup table, from all 2k(k - 1)/2 undirected graphettes g of size k ≤ 8 to their canonical representatives [Formula: see text], as well as the permutation that transforms g to [Formula: see text]. We also compute all automorphism orbits for each canonical graphette. Thus, given any k ≤ 8 nodes in a graph G, we can in constant time infer which graphette it is, as well as which orbit each of the k nodes belongs to. Sampling a large number N of such k-sets of nodes provides an approximation of both the distribution of graphlets and orbits across G, and the orbit degree vector at each node

    Fitting a geometric graph to a protein-protein interaction network

    Get PDF
    Finding a good network null model for protein-protein interaction (PPI) networks is a fundamental issue. Such a model would provide insights into the interplay between network structure and biological function as well as into evolution. Also, network (graph) models are used to guide biological experiments and discover new biological features. It has been proposed that geometric random graphs are a good model for PPI networks. In a geometric random graph, nodes correspond to uniformly randomly distributed points in a metric space and edges (links) exist between pairs of nodes for which the corresponding points in the metric space are close enough according to some distance norm. Computational experiments have revealed close matches between key topological properties of PPI networks and geometric random graph models. In this work, we push the comparison further by exploiting the fact that the geometric property can be tested for directly. To this end, we develop an algorithm that takes PPI interaction data and embeds proteins into a low-dimensional Euclidean space, under the premise that connectivity information corresponds to Euclidean proximity, as in geometric-random graphs.We judge the sensitivity and specificity of the fit by computing the area under the Receiver Operator Characteristic (ROC) curve. The network embedding algorithm is based on multi-dimensional scaling, with the square root of the path length in a network playing the role of the Euclidean distance in the Euclidean space. The algorithm exploits sparsity for computational efficiency, and requires only a few sparse matrix multiplications, giving a complexity of O(N2) where N is the number of proteins.The algorithm has been verified in the sense that it successfully rediscovers the geometric structure in artificially constructed geometric networks, even when noise is added by re-wiring some links. Applying the algorithm to 19 publicly available PPI networks of various organisms indicated that: (a) geometric effects are present and (b) two-dimensional Euclidean space is generally as effective as higher dimensional Euclidean space for explaining the connectivity. Testing on a high-confidence yeast data set produced a very strong indication of geometric structure (area under the ROC curve of 0.89), with this network being essentially indistinguishable from a noisy geometric network. Overall, the results add support to the hypothesis that PPI networks have a geometric structure

    Fitting a geometric graph to a protein-protein interaction network

    Get PDF
    Finding a good network null model for protein-protein interaction (PPI) networks is a fundamental issue. Such a model would provide insights into the interplay between network structure and biological function as well as into evolution. Also, network (graph) models are used to guide biological experiments and discover new biological features. It has been proposed that geometric random graphs are a good model for PPI networks. In a geometric random graph, nodes correspond to uniformly randomly distributed points in a metric space and edges (links) exist between pairs of nodes for which the corresponding points in the metric space are close enough according to some distance norm. Computational experiments have revealed close matches between key topological properties of PPI networks and geometric random graph models. In this work, we push the comparison further by exploiting the fact that the geometric property can be tested for directly. To this end, we develop an algorithm that takes PPI interaction data and embeds proteins into a low-dimensional Euclidean space, under the premise that connectivity information corresponds to Euclidean proximity, as in geometric-random graphs.We judge the sensitivity and specificity of the fit by computing the area under the Receiver Operator Characteristic (ROC) curve. The network embedding algorithm is based on multi-dimensional scaling, with the square root of the path length in a network playing the role of the Euclidean distance in the Euclidean space. The algorithm exploits sparsity for computational efficiency, and requires only a few sparse matrix multiplications, giving a complexity of O(N2) where N is the number of proteins.The algorithm has been verified in the sense that it successfully rediscovers the geometric structure in artificially constructed geometric networks, even when noise is added by re-wiring some links. Applying the algorithm to 19 publicly available PPI networks of various organisms indicated that: (a) geometric effects are present and (b) two-dimensional Euclidean space is generally as effective as higher dimensional Euclidean space for explaining the connectivity. Testing on a high-confidence yeast data set produced a very strong indication of geometric structure (area under the ROC curve of 0.89), with this network being essentially indistinguishable from a noisy geometric network. Overall, the results add support to the hypothesis that PPI networks have a geometric structure

    Probabilistic graphlets capture biological function in probabilistic molecular networks

    Get PDF
    Motivation: Molecular interactions have been successfully modeled and analyzed as networks, where nodes represent molecules and edges represent the interactions between them. These networks revealed that molecules with similar local network structure also have similar biological functions. The most sensitive measures of network structure are based on graphlets. However, graphlet-based methods thus far are only applicable to unweighted networks, whereas real-world molecular networks may have weighted edges that can represent the probability of an interaction occurring in the cell. This information is commonly discarded when applying thresholds to generate unweighted networks, which may lead to information loss. Results: We introduce probabilistic graphlets as a tool for analyzing the local wiring patterns of probabilistic networks. To assess their performance compared to unweighted graphlets, we generate synthetic networks based on different well-known random network models and edge probability distributions and demonstrate that probabilistic graphlets outperform their unweighted counterparts in distinguishing network structures. Then we model different real-world molecular interaction networks as weighted graphs with probabilities as weights on edges and we analyze them with our new weighted graphlets-based methods. We show that due to their probabilistic nature, probabilistic graphlet-based methods more robustly capture biological information in these data, while simultaneously showing a higher sensitivity to identify condition-specific functions compared to their unweighted graphlet-based method counterparts.This work was supported by the European Research Council (ERC) Consolidator Grant 770827, the Serbian Ministry of Education and Science. Project III44006, the Slovenian Research Agency project J1-8155 and The Prostate Project.Peer ReviewedPostprint (author's final draft
    • …
    corecore