18 research outputs found
SANA NetGO: A combinatorial approach to using Gene Ontology (GO) terms to score network alignments
Gene Ontology (GO) terms are frequently used to score alignments between
protein-protein interaction (PPI) networks. Methods exist to measure the GO
similarity between two proteins in isolation, but pairs of proteins in a
network alignment are not isolated: each pairing is implicitly dependent upon
every other pairing via the alignment itself. Current methods fail to take into
account the frequency of GO terms across the networks, and attempt to account
for common GO terms in an ad hoc fashion by imposing arbitrary rules on when to
"allow" GO terms based on their location in the GO hierarchy, rather than using
readily available frequency information in the PPI networks themselves. Here we
develop a new measure, NetGO, that naturally weighs infrequent, informative GO
terms more heavily than frequent, less informative GO terms, without requiring
arbitrary cutoffs. In particular, NetGO down-weights the score of frequent GO
terms according to their frequency in the networks being aligned. This is a
global measure applicable only to alignments, independent of pairwise GO
measures, in the same sense that the edge-based EC or S3 scores are global
measures of topological similarity independent of pairwise topological
similarities. We demonstrate the superiority of NetGO by creating alignments of
predetermined quality based on homologous pairs of nodes and show that NetGO
correlates with alignment quality much better than any existing GO-based
alignment measures. We also demonstrate that NetGO provides a measure of
taxonomic similarity between species, consistent with existing taxonomic
measures--a feature not shared with existing GO-based network alignment
measures. Finally, we re-score alignments produced by almost a dozen aligners
from a previous study and show that NetGO does a better job than existing
measures at separating good alignments from bad ones
Fair Evaluation of Global Network Aligners
Biological network alignment identifies topologically and functionally
conserved regions between networks of different species. It encompasses two
algorithmic steps: node cost function (NCF), which measures similarities
between nodes in different networks, and alignment strategy (AS), which uses
these similarities to rapidly identify high-scoring alignments. Different
methods use both different NCFs and different ASs. Thus, it is unclear whether
the superiority of a method comes from its NCF, its AS, or both. We already
showed on MI-GRAAL and IsoRankN that combining NCF of one method and AS of
another method can lead to a new superior method. Here, we evaluate MI-GRAAL
against newer GHOST to potentially further improve alignment quality. Also, we
approach several important questions that have not been asked systematically
thus far. First, we ask how much of the node similarity information in NCF
should come from sequence data compared to topology data. Existing methods
determine this more-less arbitrarily, which could affect the resulting
alignment(s). Second, when topology is used in NCF, we ask how large the size
of the neighborhoods of the compared nodes should be. Existing methods assume
that larger neighborhood sizes are better.
We find that MI-GRAAL's NCF is superior to GHOST's NCF, while the performance
of the methods' ASs is data-dependent. Thus, the combination of MI-GRAAL's NCF
and GHOST's AS could be a new superior method for certain data. Also, which
amount of sequence information is used within NCF does not affect alignment
quality, while the inclusion of topological information is crucial. Finally,
larger neighborhood sizes are preferred, but often, it is the second largest
size that is superior, and using this size would decrease computational
complexity.
Together, our results give several general recommendations for a fair
evaluation of network alignment methods.Comment: 19 pages. 10 figures. Presented at the 2014 ISMB Conference, July
13-15, Boston, M
An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner.
Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks
Unified Alignment of Protein-Protein Interaction Networks
Paralleling the increasing availability of protein-protein interaction (PPI) network data, several network alignment methods have been proposed. Network alignments have been used to uncover functionally conserved network parts and to transfer annotations. However, due to the computational intractability of the network alignment problem, aligners are heuristics providing divergent solutions and no consensus exists on a gold standard, or which scoring scheme should be used to evaluate them. We comprehensively evaluate the alignment scoring schemes and global network aligners on large scale PPI data and observe that three methods, HUBALIGN, L-GRAAL and NATALIE, regularly produce the most topologically and biologically coherent alignments. We study the collective behaviour of network aligners and observe that PPI networks are almost entirely aligned with a handful of aligners that we unify into a new tool, Ulign. Ulign enables complete alignment of two networks, which traditional global and local aligners fail to do. Also, multiple mappings of Ulign define biologically relevant soft clusterings of proteins in PPI networks, which may be used for refining the transfer of annotations across networks. Hence, PPI networks are already well investigated by current aligners, so to gain additional biological insights, a paradigm shift is needed. We propose such a shift come from aligning all available data types collectively rather than any particular data type in isolation from others