639 research outputs found
Biological network comparison using graphlet degree distribution
Analogous to biological sequence comparison, comparing cellular networks is
an important problem that could provide insight into biological understanding
and therapeutics. For technical reasons, comparing large networks is
computationally infeasible, and thus heuristics such as the degree distribution
have been sought. It is easy to demonstrate that two networks are different by
simply showing a short list of properties in which they differ. It is much
harder to show that two networks are similar, as it requires demonstrating
their similarity in all of their exponentially many properties. Clearly, it is
computationally prohibitive to analyze all network properties, but the larger
the number of constraints we impose in determining network similarity, the more
likely it is that the networks will truly be similar.
We introduce a new systematic measure of a network's local structure that
imposes a large number of similarity constraints on networks being compared. In
particular, we generalize the degree distribution, which measures the number of
nodes 'touching' k edges, into distributions measuring the number of nodes
'touching' k graphlets, where graphlets are small connected non-isomorphic
subgraphs of a large network. Our new measure of network local structure
consists of 73 graphlet degree distributions (GDDs) of graphlets with 2-5
nodes, but it is easily extendible to a greater number of constraints (i.e.
graphlets). Furthermore, we show a way to combine the 73 GDDs into a network
'agreement' measure. Based on this new network agreement measure, we show that
almost all of the 14 eukaryotic PPI networks, including human, are better
modeled by geometric random graphs than by Erdos-Reny, random scale-free, or
Barabasi-Albert scale-free networks.Comment: Proceedings of the 2006 European Conference on Computational Biology,
ECCB'06, Eilat, Israel, January 21-24, 200
Identifying networks with common organizational principles
Many complex systems can be represented as networks, and the problem of
network comparison is becoming increasingly relevant. There are many techniques
for network comparison, from simply comparing network summary statistics to
sophisticated but computationally costly alignment-based approaches. Yet it
remains challenging to accurately cluster networks that are of a different size
and density, but hypothesized to be structurally similar. In this paper, we
address this problem by introducing a new network comparison methodology that
is aimed at identifying common organizational principles in networks. The
methodology is simple, intuitive and applicable in a wide variety of settings
ranging from the functional classification of proteins to tracking the
evolution of a world trade network.Comment: 26 pages, 7 figure
Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8.
Graphlets are small connected induced subgraphs of a larger graph G. Graphlets are now commonly used to quantify local and global topology of networks in the field. Methods exist to exhaustively enumerate all graphlets (and their orbits) in large networks as efficiently as possible using orbit counting equations. However, the number of graphlets in G is exponential in both the number of nodes and edges in G. Enumerating them all is already unacceptably expensive on existing large networks, and the problem will only get worse as networks continue to grow in size and density. Here we introduce an efficient method designed to aid statistical sampling of graphlets up to size k = 8 from a large network. We define graphettes as the generalization of graphlets allowing for disconnected graphlets. Given a particular (undirected) graphette g, we introduce the idea of the canonical graphette [Formula: see text] as a representative member of the isomorphism group Iso(g) of g. We compute the mapping [Formula: see text], in the form of a lookup table, from all 2k(k - 1)/2 undirected graphettes g of size k ≤ 8 to their canonical representatives [Formula: see text], as well as the permutation that transforms g to [Formula: see text]. We also compute all automorphism orbits for each canonical graphette. Thus, given any k ≤ 8 nodes in a graph G, we can in constant time infer which graphette it is, as well as which orbit each of the k nodes belongs to. Sampling a large number N of such k-sets of nodes provides an approximation of both the distribution of graphlets and orbits across G, and the orbit degree vector at each node
Topological network alignment uncovers biological function and phylogeny
Sequence comparison and alignment has had an enormous impact on our
understanding of evolution, biology, and disease. Comparison and alignment of
biological networks will likely have a similar impact. Existing network
alignments use information external to the networks, such as sequence, because
no good algorithm for purely topological alignment has yet been devised. In
this paper, we present a novel algorithm based solely on network topology, that
can be used to align any two networks. We apply it to biological networks to
produce by far the most complete topological alignments of biological networks
to date. We demonstrate that both species phylogeny and detailed biological
function of individual proteins can be extracted from our alignments.
Topology-based alignments have the potential to provide a completely new,
independent source of phylogenetic information. Our alignment of the
protein-protein interaction networks of two very different species--yeast and
human--indicate that even distant species share a surprising amount of network
topology with each other, suggesting broad similarities in internal cellular
wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde
- …