492 research outputs found
Data-driven network alignment
Biological network alignment (NA) aims to find a node mapping between
species' molecular networks that uncovers similar network regions, thus
allowing for transfer of functional knowledge between the aligned nodes.
However, current NA methods do not end up aligning functionally related nodes.
A likely reason is that they assume it is topologically similar nodes that are
functionally related. However, we show that this assumption does not hold well.
So, a paradigm shift is needed with how the NA problem is approached. We
redefine NA as a data-driven framework, TARA (daTA-dRiven network Alignment),
which attempts to learn the relationship between topological relatedness and
functional relatedness without assuming that topological relatedness
corresponds to topological similarity, like traditional NA methods do. TARA
trains a classifier to predict whether two nodes from different networks are
functionally related based on their network topological patterns. We find that
TARA is able to make accurate predictions. TARA then takes each pair of nodes
that are predicted as related to be part of an alignment. Like traditional NA
methods, TARA uses this alignment for the across-species transfer of functional
knowledge. Clearly, TARA as currently implemented uses topological but not
protein sequence information for this task. We find that TARA outperforms
existing state-of-the-art NA methods that also use topological information,
WAVE and SANA, and even outperforms or complements a state-of-the-art NA method
that uses both topological and sequence information, PrimAlign. Hence, adding
sequence information to TARA, which is our future work, is likely to further
improve its performance
Capturing Topology in Graph Pattern Matching
Graph pattern matching is often defined in terms of subgraph isomorphism, an
NP-complete problem. To lower its complexity, various extensions of graph
simulation have been considered instead. These extensions allow pattern
matching to be conducted in cubic-time. However, they fall short of capturing
the topology of data graphs, i.e., graphs may have a structure drastically
different from pattern graphs they match, and the matches found are often too
large to understand and analyze. To rectify these problems, this paper proposes
a notion of strong simulation, a revision of graph simulation, for graph
pattern matching. (1) We identify a set of criteria for preserving the topology
of graphs matched. We show that strong simulation preserves the topology of
data graphs and finds a bounded number of matches. (2) We show that strong
simulation retains the same complexity as earlier extensions of simulation, by
providing a cubic-time algorithm for computing strong simulation. (3) We
present the locality property of strong simulation, which allows us to
effectively conduct pattern matching on distributed graphs. (4) We
experimentally verify the effectiveness and efficiency of these algorithms,
using real-life data and synthetic data.Comment: VLDB201
On Ranked Approximate Matching Of Large Attributed Graphs
Many emerging database applications entail sophisticated graph based query manipulation, predominantly evident in large-scale
scientific applications. To access the information embedded in
graphs, efficient graph matching tools and algorithms have become of prime importance. Although the prohibitively expensive time
complexity associated with exact sub-graph isomorphism techniques has limited its efficacy in the application domain, approximate yet efficient graph matching techniques have received much attention due to their pragmatic applicability. Since public domain databases are noisy and incomplete in nature, inexact graph matching techniques have proven to be more promising in terms of inferring knowledge from numerous structural data repositories.
Contemporary algorithms for approximate graph matching incur
substantial cost to generate candidates, and then test and rank them for possible match. Leading algorithms balance processing time and overall resource consumption cost by leveraging sophisticated data structures and graph properties to improve overall performance.
In this dissertation, we propose novel techniques for approximate graph matching based on two different techniques called TraM or
Top-k Graph Matching and Approximate Network Matching or AtoM respectively. While TraM off-loads a significant amount of its processing on to the database making the approach viable for large graphs, AtoM provides improved turn around time by means of graph
summarization prior to matching. The summarization process is aided by domain sensitive similarity matrices, which in turn helps improve the matching performance. The vector space embedding of the graphs and efficient filtration of the search space enables computation of approximate graph similarity at a throw-away cost. We combine domain similarity and topological similarity to obtain overall graph similarity and compare them with neighborhood biased segments of the data-graph for proper matches. We show that our approach can naturally support the emerging trend in graph pattern queries and discuss its suitability for large networks as it can be seamlessly transformed to adhere to map-reduce framework.
We have conducted thorough experiments on several synthetic and real data sets, and have demonstrated the effectiveness and efficiency of the proposed method
- …