26,442 research outputs found
Partitioning algorithms for induced subgraph problems
This dissertation introduces the MCSPLIT family of algorithms for two closely-related NP-hard problems that involve finding a large induced subgraph contained by each of two input graphs: the induced subgraph isomorphism problem and the maximum common induced subgraph problem.
The MCSPLIT algorithms resemble forward-checking constrant programming algorithms, but use problem-specific data structures that allow multiple, identical domains to be stored without duplication. These data structures enable fast, simple constraint propagation algorithms and very fast calculation of upper bounds. Versions of these algorithms for both sparse and dense graphs are described and implemented. The resulting algorithms are over an order of magnitude faster than the best existing algorithm for maximum common induced subgraph on unlabelled graphs, and outperform the state of the art on several classes of induced subgraph isomorphism instances.
A further advantage of the MCSPLIT data structures is that variables and values are treated identically; this allows us to choose to branch on variables representing vertices of either input graph with no overhead. An extensive set of experiments shows that such two-sided branching can be particularly beneficial if the two input graphs have very different orders or densities. Finally, we turn from subgraphs to supergraphs, tackling the problem of finding a small graph that contains every member of a given family of graphs as an induced subgraph. Exact and heuristic techniques are developed for this problem, in each case using a MCSPLIT algorithm as a subroutine. These algorithms allow us to add new terms to two entries of the On-Line Encyclopedia of Integer Sequences
Fast Search for Dynamic Multi-Relational Graphs
Acting on time-critical events by processing ever growing social media or
news streams is a major technical challenge. Many of these data sources can be
modeled as multi-relational graphs. Continuous queries or techniques to search
for rare events that typically arise in monitoring applications have been
studied extensively for relational databases. This work is dedicated to answer
the question that emerges naturally: how can we efficiently execute a
continuous query on a dynamic graph? This paper presents an exact subgraph
search algorithm that exploits the temporal characteristics of representative
queries for online news or social media monitoring. The algorithm is based on a
novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the
structural and semantic characteristics of the underlying multi-relational
graph. The paper concludes with extensive experimentation on several real-world
datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM),
201
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
From data towards knowledge: Revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data
Genetic and pharmacological perturbation experiments, such as deleting a gene
and monitoring gene expression responses, are powerful tools for studying
cellular signal transduction pathways. However, it remains a challenge to
automatically derive knowledge of a cellular signaling system at a conceptual
level from systematic perturbation-response data. In this study, we explored a
framework that unifies knowledge mining and data mining approaches towards the
goal. The framework consists of the following automated processes: 1) applying
an ontology-driven knowledge mining approach to identify functional modules
among the genes responding to a perturbation in order to reveal potential
signals affected by the perturbation; 2) applying a graph-based data mining
approach to search for perturbations that affect a common signal with respect
to a functional module, and 3) revealing the architecture of a signaling system
organize signaling units into a hierarchy based on their relationships.
Applying this framework to a compendium of yeast perturbation-response data, we
have successfully recovered many well-known signal transduction pathways; in
addition, our analysis have led to many hypotheses regarding the yeast signal
transduction system; finally, our analysis automatically organized perturbed
genes as a graph reflecting the architect of the yeast signaling system.
Importantly, this framework transformed molecular findings from a gene level to
a conceptual level, which readily can be translated into computable knowledge
in the form of rules regarding the yeast signaling system, such as "if genes
involved in MAPK signaling are perturbed, genes involved in pheromone responses
will be differentially expressed"
- …