4 research outputs found
Probabilistic Random Walk Models for Comparative Network Analysis
Graph-based systems and data analysis methods have become critical tools in many
fields as they can provide an intuitive way of representing and analyzing interactions between
variables. Due to the advances in measurement techniques, a massive amount of
labeled data that can be represented as nodes on a graph (or network) have been archived
in databases. Additionally, novel data without label information have been gradually generated
and archived. Labeling and identifying characteristics of novel data is an important
first step in utilizing the valuable data in an effective and meaningful way. Comparative
network analysis is an effective computational means to identify and predict the properties
of the unlabeled data by comparing the similarities and differences between well-studied
and less-studied networks. Comparative network analysis aims to identify the matching
nodes and conserved subnetworks across multiple networks to enable a prediction of the
properties of the nodes in the less-studied networks based on the properties of the matching
nodes in the well-studied networks (i.e., transferring knowledge between networks).
One of the fundamental and important questions in comparative network analysis is
how to accurately estimate node-to-node correspondence as it can be a critical clue in
analyzing the similarities and differences between networks. Node correspondence is a
comprehensive similarity that integrates various types of similarity measurements in a
balanced manner. However, there are several challenges in accurately estimating the node
correspondence for large-scale networks. First, the scale of the networks is a critical issue.
As networks generally include a large number of nodes, we have to examine an extremely
large space and it can pose a computational challenge due to the combinatorial nature of
the problem. Furthermore, although there are matching nodes and conserved subnetworks
in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity.
In this dissertation, novel probabilistic random walk models are proposed to accurately
estimate node-to-node correspondence between networks. First, we propose a context-sensitive
random walk (CSRW) model. In the CSRW model, the random walker analyzes
the context of the current position of the random walker and it can switch the random
movement to either a simultaneous walk on both networks or an individual walk on one
of the networks. The context-sensitive nature of the random walker enables the method
to effectively integrate different types of similarities by dealing with structural variations.
Second, we propose the CUFID (Comparative network analysis Using the steady-state
network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct
an integrated network by inserting pseudo edges between potential matching nodes in
different networks. Then, we design the random walk protocol to transit more frequently
between potential matching nodes as their node similarity increases and they have more
matching neighboring nodes. We apply the proposed random walk models to comparative
network analysis problems: global network alignment and network querying. Through
extensive performance evaluations, we demonstrate that the proposed random walk models
can accurately estimate node correspondence and these can lead to improved and reliable
network comparison results
BioFabric Visualization of Network Alignments
Background Dozens of global network alignment algorithms have been developed over the past fifteen years. Effective network visualization tools are lacking and would enhance our ability to gain an intuitive understanding of the strengths and weaknesses of these algorithms. Results We have created a plugin to the existing network visualization tool BioFabric, called VISNAB: Visualization of Network Alignments using BioFabric . We leverage BioFabric’s unique approach to layout (nodes are horizontal lines connected by vertical lines representing edges) to improve understanding of network alignment performance. Our visualization tool allows the user to clearly spot deficiencies in alignments that cannot be detected through simply evaluating and comparing standard numerical topological measures such as the Edge Coverage ( EC ) or Symmetric Substructure Score ( S 3 ). Furthermore, we provide new automatic layouts that allow researchers to identify problem areas in an alignment. Finally, our new definitions of node groups and link groups that arise from our visualization technique allows us to also introduce novel numeric measures for assessing alignment quality. Conclusions Our new approach to visualize network alignments will allow researchers to gain a new, and better, understanding of the strengths and shortcomings of the many available network alignment algorithms
Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8
Graphlets are small connected induced subgraphs of a larger graph .
Graphlets are now commonly used to quantify local and global topology of
networks in the field. Methods exist to exhaustively enumerate all graphlets
(and their orbits) in large networks as efficiently as possible using orbit
counting equations. However, the number of graphlets in is exponential in
both the number of nodes and edges in . Enumerating them all is already
unacceptably expensive on existing large networks, and the problem will only
get worse as networks continue to grow in size and density. Here we introduce
an efficient method designed to aid statistical sampling of graphlets up to
size from a large network. We define graphettes as the generalization of
graphlets allowing for disconnected graphlets. Given a particular (undirected)
graphette , we introduce the idea of the canonical graphette
as a representative member of the isomorphism group of . We compute
the mapping , in the form of a lookup table, from all
undirected graphettes of size to their canonical
representatives , as well as the permutation that transforms
to . We also compute all automorphism orbits for each canonical
graphette. Thus, given any nodes in a graph , we can in constant
time infer which graphette it is, as well as which orbit each of the nodes
belongs to. Sampling a large number of such -sets of nodes provides an
approximation of both the distribution of graphlets and orbits across , and
the orbit degree vector at each node.Comment: 13 pages, 4 figures, 2 tables. Accepted to PLOS ON