44,921 research outputs found
On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree
Exact pattern matching in labeled graphs is the problem of searching paths of
a graph that spell the same string as the pattern . This
basic problem can be found at the heart of more complex operations on variation
graphs in computational biology, of query operations in graph databases, and of
analysis operations in heterogeneous networks, where the nodes of some paths
must match a sequence of labels or types. We describe a simple conditional
lower bound that, for any constant , an -time or an -time algorithm for exact pattern
matching on graphs, with node labels and patterns drawn from a binary alphabet,
cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is
false. The result holds even if restricted to undirected graphs of maximum
degree three or directed acyclic graphs of maximum sum of indegree and
outdegree three. Although a conditional lower bound of this kind can be somehow
derived from previous results (Backurs and Indyk, FOCS'16), we give a direct
reduction from SETH for dissemination purposes, as the result might interest
researchers from several areas, such as computational biology, graph database,
and graph mining, as mentioned before. Indeed, as approximate pattern matching
on graphs can be solved in time, exact and approximate matching are
thus equally hard (quadratic time) on graphs under the SETH assumption. In
comparison, the same problems restricted to strings have linear time vs
quadratic time solutions, respectively, where the latter ones have a matching
SETH lower bound on computing the edit distance of two strings (Backurs and
Indyk, STOC'15).Comment: Using Lemma 12 and Lemma 13 might to be enough to prove Lemma 14.
However, the proof of Lemma 14 is correct if you assume that the graph used
in the reduction is a DAG. Hence, since the problem is already quadratic for
a DAG and a binary alphabet, it has to be quadratic also for a general graph
and a binary alphabe
Capturing Topology in Graph Pattern Matching
Graph pattern matching is often defined in terms of subgraph isomorphism, an
NP-complete problem. To lower its complexity, various extensions of graph
simulation have been considered instead. These extensions allow pattern
matching to be conducted in cubic-time. However, they fall short of capturing
the topology of data graphs, i.e., graphs may have a structure drastically
different from pattern graphs they match, and the matches found are often too
large to understand and analyze. To rectify these problems, this paper proposes
a notion of strong simulation, a revision of graph simulation, for graph
pattern matching. (1) We identify a set of criteria for preserving the topology
of graphs matched. We show that strong simulation preserves the topology of
data graphs and finds a bounded number of matches. (2) We show that strong
simulation retains the same complexity as earlier extensions of simulation, by
providing a cubic-time algorithm for computing strong simulation. (3) We
present the locality property of strong simulation, which allows us to
effectively conduct pattern matching on distributed graphs. (4) We
experimentally verify the effectiveness and efficiency of these algorithms,
using real-life data and synthetic data.Comment: VLDB201
A path following algorithm for the graph matching problem
We propose a convex-concave programming approach for the labeled weighted
graph matching problem. The convex-concave programming formulation is obtained
by rewriting the weighted graph matching problem as a least-square problem on
the set of permutation matrices and relaxing it to two different optimization
problems: a quadratic convex and a quadratic concave optimization problem on
the set of doubly stochastic matrices. The concave relaxation has the same
global minimum as the initial graph matching problem, but the search for its
global minimum is also a hard combinatorial problem. We therefore construct an
approximation of the concave problem solution by following a solution path of a
convex-concave problem obtained by linear interpolation of the convex and
concave formulations, starting from the convex relaxation. This method allows
to easily integrate the information on graph label similarities into the
optimization problem, and therefore to perform labeled weighted graph matching.
The algorithm is compared with some of the best performing graph matching
methods on four datasets: simulated graphs, QAPLib, retina vessel images and
handwritten chinese characters. In all cases, the results are competitive with
the state-of-the-art.Comment: 23 pages, 13 figures,typo correction, new results in sections 4,5,
On the Hardness and Inapproximability of Recognizing Wheeler Graphs
In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a way which is space efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present:
- The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA\u27s for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA\u27s where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1;
- There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time;
- We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation;
- We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1);
The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial time solvable, raising the open question of which parameters determine this problem\u27s difficulty
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
- …