742 research outputs found
Matched Filters for Noisy Induced Subgraph Detection
The problem of finding the vertex correspondence between two noisy graphs
with different number of vertices where the smaller graph is still large has
many applications in social networks, neuroscience, and computer vision. We
propose a solution to this problem via a graph matching matched filter:
centering and padding the smaller adjacency matrix and applying graph matching
methods to align it to the larger network. The centering and padding schemes
can be incorporated into any algorithm that matches using adjacency matrices.
Under a statistical model for correlated pairs of graphs, which yields a noisy
copy of the small graph within the larger graph, the resulting optimization
problem can be guaranteed to recover the true vertex correspondence between the
networks.
However, there are currently no efficient algorithms for solving this
problem. To illustrate the possibilities and challenges of such problems, we
use an algorithm that can exploit a partially known correspondence and show via
varied simulations and applications to {\it Drosophila} and human connectomes
that this approach can achieve good performance.Comment: 41 pages, 7 figure
Matched filters for noisy induced subgraph detection
First author draftWe consider the problem of finding the vertex correspondence between two graphs with different number of vertices where the smaller graph is still potentially large. We propose a solution to this problem via a graph matching matched filter: padding the smaller graph in different ways and then using graph matching methods to align it to the larger network. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks, though there are currently no efficient algorithms for solving this problem. We consider an approach that exploits a partially known correspondence and show via varied simulations and applications to the Drosophila connectome that in practice this approach can achieve good performance.https://arxiv.org/abs/1803.02423https://arxiv.org/abs/1803.0242
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
Maximum common subgraph isomorphism algorithms for the matching of chemical structures
The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks
Capturing Topology in Graph Pattern Matching
Graph pattern matching is often defined in terms of subgraph isomorphism, an
NP-complete problem. To lower its complexity, various extensions of graph
simulation have been considered instead. These extensions allow pattern
matching to be conducted in cubic-time. However, they fall short of capturing
the topology of data graphs, i.e., graphs may have a structure drastically
different from pattern graphs they match, and the matches found are often too
large to understand and analyze. To rectify these problems, this paper proposes
a notion of strong simulation, a revision of graph simulation, for graph
pattern matching. (1) We identify a set of criteria for preserving the topology
of graphs matched. We show that strong simulation preserves the topology of
data graphs and finds a bounded number of matches. (2) We show that strong
simulation retains the same complexity as earlier extensions of simulation, by
providing a cubic-time algorithm for computing strong simulation. (3) We
present the locality property of strong simulation, which allows us to
effectively conduct pattern matching on distributed graphs. (4) We
experimentally verify the effectiveness and efficiency of these algorithms,
using real-life data and synthetic data.Comment: VLDB201
Subgraph covers -- An information theoretic approach to motif analysis in networks
Many real world networks contain a statistically surprising number of certain
subgraphs, called network motifs. In the prevalent approach to motif analysis,
network motifs are detected by comparing subgraph frequencies in the original
network with a statistical null model. In this paper we propose an alternative
approach to motif analysis where network motifs are defined to be connectivity
patterns that occur in a subgraph cover that represents the network using
minimal total information. A subgraph cover is defined to be a set of subgraphs
such that every edge of the graph is contained in at least one of the subgraphs
in the cover. Some recently introduced random graph models that can incorporate
significant densities of motifs have natural formulations in terms of subgraph
covers and the presented approach can be used to match networks with such
models. To prove the practical value of our approach we also present a
heuristic for the resulting NP-hard optimization problem and give results for
several real world networks.Comment: 10 pages, 7 tables, 1 Figur
Generic Strategies for Chemical Space Exploration
Computational approaches to exploring "chemical universes", i.e., very large
sets, potentially infinite sets of compounds that can be constructed by a
prescribed collection of reaction mechanisms, in practice suffer from a
combinatorial explosion. It quickly becomes impossible to test, for all pairs
of compounds in a rapidly growing network, whether they can react with each
other. More sophisticated and efficient strategies are therefore required to
construct very large chemical reaction networks.
Undirected labeled graphs and graph rewriting are natural models of chemical
compounds and chemical reactions. Borrowing the idea of partial evaluation from
functional programming, we introduce partial applications of rewrite rules.
Binding substrate to rules increases the number of rules but drastically prunes
the substrate sets to which it might match, resulting in dramatically reduced
resource requirements. At the same time, exploration strategies can be guided,
e.g. based on restrictions on the product molecules to avoid the explicit
enumeration of very unlikely compounds. To this end we introduce here a generic
framework for the specification of exploration strategies in graph-rewriting
systems. Using key examples of complex chemical networks from sugar chemistry
and the realm of metabolic networks we demonstrate the feasibility of a
high-level strategy framework.
The ideas presented here can not only be used for a strategy-based chemical
space exploration that has close correspondence of experimental results, but
are much more general. In particular, the framework can be used to emulate
higher-level transformation models such as illustrated in a small puzzle game
Frequent Pattern Finding in Integrated Biological Networks
Biomedical research is undergoing a revolution with the advance of high-throughput technologies. A major challenge in the post-genomic era is to understand how genes, proteins and small molecules are organized into signaling pathways and regulatory networks. To simplify the analysis of large complex molecular networks, strategies are sought to break them down into small yet relatively independent network modules, e.g. pathways and protein complexes.
In fulfillment of the motivation to find evolutionary origins of network modules, a novel strategy has been developed to uncover duplicated pathways and protein complexes. This search was first formulated into a computational problem which finds frequent patterns in integrated graphs. The whole framework was then successfully implemented as the software package BLUNT, which includes a parallelized version.
To evaluate the biological significance of the work, several large datasets were chosen, with each dataset targeting a different biological question. An application of BLUNT was performed on the yeast protein-protein interaction network, which is described. A large number of frequent patterns were discovered and predicted to be duplicated pathways. To explore how these pathways may have diverged since duplication, the differential regulation of duplicated pathways was studied at the transcriptional level, both in terms of time and location.
As demonstrated, this algorithm can be used as new data mining tool for large scale biological data in general. It also provides a novel strategy to study the evolution of pathways and protein complexes in a systematic way. Understanding how pathways and protein complexes evolve will greatly benefit the fundamentals of biomedical research
- …