5 research outputs found
Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model
BACKGROUND: Comparative analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved functional network modules across different species. Such modules typically consist of orthologous proteins with conserved interactions, which can be exploited to computationally predict the modules through network comparison. RESULTS: In this work, we propose a novel probabilistic framework for comparing PPI networks and effectively predicting the correspondence between proteins, represented as network nodes, that belong to conserved functional modules across the given PPI networks. The basic idea is to estimate the steady-state network flow between nodes that belong to different PPI networks based on a Markov random walk model. The random walker is designed to make random moves to adjacent nodes within a PPI network as well as cross-network moves between potential orthologous nodes with high sequence similarity. Based on this Markov random walk model, we estimate the steady-state network flow – or the long-term relative frequency of the transitions that the random walker makes – between nodes in different PPI networks, which can be used as a probabilistic score measuring their potential correspondence. Subsequently, the estimated scores can be used for detecting orthologous proteins in conserved functional modules through network alignment. CONCLUSIONS: Through evaluations based on multiple real PPI networks, we demonstrate that the proposed scheme leads to improved alignment results that are biologically more meaningful at reduced computational cost, outperforming the current state-of-the-art algorithms. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/CUFID
Data-driven network alignment
Biological network alignment (NA) aims to find a node mapping between
species' molecular networks that uncovers similar network regions, thus
allowing for transfer of functional knowledge between the aligned nodes.
However, current NA methods do not end up aligning functionally related nodes.
A likely reason is that they assume it is topologically similar nodes that are
functionally related. However, we show that this assumption does not hold well.
So, a paradigm shift is needed with how the NA problem is approached. We
redefine NA as a data-driven framework, TARA (daTA-dRiven network Alignment),
which attempts to learn the relationship between topological relatedness and
functional relatedness without assuming that topological relatedness
corresponds to topological similarity, like traditional NA methods do. TARA
trains a classifier to predict whether two nodes from different networks are
functionally related based on their network topological patterns. We find that
TARA is able to make accurate predictions. TARA then takes each pair of nodes
that are predicted as related to be part of an alignment. Like traditional NA
methods, TARA uses this alignment for the across-species transfer of functional
knowledge. Clearly, TARA as currently implemented uses topological but not
protein sequence information for this task. We find that TARA outperforms
existing state-of-the-art NA methods that also use topological information,
WAVE and SANA, and even outperforms or complements a state-of-the-art NA method
that uses both topological and sequence information, PrimAlign. Hence, adding
sequence information to TARA, which is our future work, is likely to further
improve its performance
Probabilistic Random Walk Models for Comparative Network Analysis
Graph-based systems and data analysis methods have become critical tools in many
fields as they can provide an intuitive way of representing and analyzing interactions between
variables. Due to the advances in measurement techniques, a massive amount of
labeled data that can be represented as nodes on a graph (or network) have been archived
in databases. Additionally, novel data without label information have been gradually generated
and archived. Labeling and identifying characteristics of novel data is an important
first step in utilizing the valuable data in an effective and meaningful way. Comparative
network analysis is an effective computational means to identify and predict the properties
of the unlabeled data by comparing the similarities and differences between well-studied
and less-studied networks. Comparative network analysis aims to identify the matching
nodes and conserved subnetworks across multiple networks to enable a prediction of the
properties of the nodes in the less-studied networks based on the properties of the matching
nodes in the well-studied networks (i.e., transferring knowledge between networks).
One of the fundamental and important questions in comparative network analysis is
how to accurately estimate node-to-node correspondence as it can be a critical clue in
analyzing the similarities and differences between networks. Node correspondence is a
comprehensive similarity that integrates various types of similarity measurements in a
balanced manner. However, there are several challenges in accurately estimating the node
correspondence for large-scale networks. First, the scale of the networks is a critical issue.
As networks generally include a large number of nodes, we have to examine an extremely
large space and it can pose a computational challenge due to the combinatorial nature of
the problem. Furthermore, although there are matching nodes and conserved subnetworks
in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity.
In this dissertation, novel probabilistic random walk models are proposed to accurately
estimate node-to-node correspondence between networks. First, we propose a context-sensitive
random walk (CSRW) model. In the CSRW model, the random walker analyzes
the context of the current position of the random walker and it can switch the random
movement to either a simultaneous walk on both networks or an individual walk on one
of the networks. The context-sensitive nature of the random walker enables the method
to effectively integrate different types of similarities by dealing with structural variations.
Second, we propose the CUFID (Comparative network analysis Using the steady-state
network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct
an integrated network by inserting pseudo edges between potential matching nodes in
different networks. Then, we design the random walk protocol to transit more frequently
between potential matching nodes as their node similarity increases and they have more
matching neighboring nodes. We apply the proposed random walk models to comparative
network analysis problems: global network alignment and network querying. Through
extensive performance evaluations, we demonstrate that the proposed random walk models
can accurately estimate node correspondence and these can lead to improved and reliable
network comparison results
Probabilistic Random Walk Models for Comparative Network Analysis
Graph-based systems and data analysis methods have become critical tools in many
fields as they can provide an intuitive way of representing and analyzing interactions between
variables. Due to the advances in measurement techniques, a massive amount of
labeled data that can be represented as nodes on a graph (or network) have been archived
in databases. Additionally, novel data without label information have been gradually generated
and archived. Labeling and identifying characteristics of novel data is an important
first step in utilizing the valuable data in an effective and meaningful way. Comparative
network analysis is an effective computational means to identify and predict the properties
of the unlabeled data by comparing the similarities and differences between well-studied
and less-studied networks. Comparative network analysis aims to identify the matching
nodes and conserved subnetworks across multiple networks to enable a prediction of the
properties of the nodes in the less-studied networks based on the properties of the matching
nodes in the well-studied networks (i.e., transferring knowledge between networks).
One of the fundamental and important questions in comparative network analysis is
how to accurately estimate node-to-node correspondence as it can be a critical clue in
analyzing the similarities and differences between networks. Node correspondence is a
comprehensive similarity that integrates various types of similarity measurements in a
balanced manner. However, there are several challenges in accurately estimating the node
correspondence for large-scale networks. First, the scale of the networks is a critical issue.
As networks generally include a large number of nodes, we have to examine an extremely
large space and it can pose a computational challenge due to the combinatorial nature of
the problem. Furthermore, although there are matching nodes and conserved subnetworks
in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity.
In this dissertation, novel probabilistic random walk models are proposed to accurately
estimate node-to-node correspondence between networks. First, we propose a context-sensitive
random walk (CSRW) model. In the CSRW model, the random walker analyzes
the context of the current position of the random walker and it can switch the random
movement to either a simultaneous walk on both networks or an individual walk on one
of the networks. The context-sensitive nature of the random walker enables the method
to effectively integrate different types of similarities by dealing with structural variations.
Second, we propose the CUFID (Comparative network analysis Using the steady-state
network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct
an integrated network by inserting pseudo edges between potential matching nodes in
different networks. Then, we design the random walk protocol to transit more frequently
between potential matching nodes as their node similarity increases and they have more
matching neighboring nodes. We apply the proposed random walk models to comparative
network analysis problems: global network alignment and network querying. Through
extensive performance evaluations, we demonstrate that the proposed random walk models
can accurately estimate node correspondence and these can lead to improved and reliable
network comparison results