752 research outputs found
Order preserving pattern matching on trees and DAGs
The order preserving pattern matching (OPPM) problem is, given a pattern
string and a text string , find all substrings of which have the
same relative orders as . In this paper, we consider two variants of the
OPPM problem where a set of text strings is given as a tree or a DAG. We show
that the OPPM problem for a single pattern of length and a text tree
of size can be solved in time if the characters of are
drawn from an integer alphabet of polynomial size. The time complexity becomes
if the pattern is over a general ordered alphabet. We
then show that the OPPM problem for a single pattern and a text DAG is
NP-complete
On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree
Exact pattern matching in labeled graphs is the problem of searching paths of
a graph that spell the same string as the pattern . This
basic problem can be found at the heart of more complex operations on variation
graphs in computational biology, of query operations in graph databases, and of
analysis operations in heterogeneous networks, where the nodes of some paths
must match a sequence of labels or types. We describe a simple conditional
lower bound that, for any constant , an -time or an -time algorithm for exact pattern
matching on graphs, with node labels and patterns drawn from a binary alphabet,
cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is
false. The result holds even if restricted to undirected graphs of maximum
degree three or directed acyclic graphs of maximum sum of indegree and
outdegree three. Although a conditional lower bound of this kind can be somehow
derived from previous results (Backurs and Indyk, FOCS'16), we give a direct
reduction from SETH for dissemination purposes, as the result might interest
researchers from several areas, such as computational biology, graph database,
and graph mining, as mentioned before. Indeed, as approximate pattern matching
on graphs can be solved in time, exact and approximate matching are
thus equally hard (quadratic time) on graphs under the SETH assumption. In
comparison, the same problems restricted to strings have linear time vs
quadratic time solutions, respectively, where the latter ones have a matching
SETH lower bound on computing the edit distance of two strings (Backurs and
Indyk, STOC'15).Comment: Using Lemma 12 and Lemma 13 might to be enough to prove Lemma 14.
However, the proof of Lemma 14 is correct if you assume that the graph used
in the reduction is a DAG. Hence, since the problem is already quadratic for
a DAG and a binary alphabet, it has to be quadratic also for a general graph
and a binary alphabe
Sparse Dynamic Programming on DAGs with Small Width
The minimum path cover problem asks us to find a minimum-cardinality set of paths that cover all the nodes of a directed acyclic graph (DAG). We study the case when the size k of a minimum path cover is small, that is, when the DAG has a small width. This case is motivated by applications in pan-genomics, where the genomic variation of a population is expressed as a DAG. We observe that classical alignment algorithms exploiting sparse dynamic programming can be extended to the sequence-against-DAG case by mimicking the algorithm for sequences on each path of a minimum path cover and handling an evaluation order anomaly with reachability queries. Namely, we introduce a general framework for DAG-extensions of sparse dynamic programming. This framework produces algorithms that are slower than their counterparts on sequences only by a factor k. We illustrate this on two classical problems extended to DAGs: longest increasing subsequence and longest common subsequence. For the former, we obtain an algorithm with running time O(k vertical bar E vertical bar log vertical bar V vertical bar). This matches the optimal solution to the classical problem variant when the input sequence is modeled as a path. We obtain an analogous result for the longest common subsequence problem. We then apply this technique to the co-linear chaining problem, which is a generalization of the above two problems. The algorithm for this problem turns out to be more involved, needing further ingredients, such as an FM-index tailored for large alphabets and a two-dimensional range search tree modified to support range maximum queries. We also study a general sequence-to-DAG alignment formulation that allows affine gap costs in the sequence. The main ingredient of the proposed framework is a new algorithm for finding a minimum path cover of a DAG (V, E) in O(k vertical bar E vertical bar log vertical bar V vertical bar) time, improving all known time-bounds when k is small and the DAG is not too dense. In addition to boosting the sparse dynamic programming framework, an immediate consequence of this new minimum path cover algorithm is an improved space/time tradeoff for reachability queries in arbitrary directed graphs.Peer reviewe
Visual Similarity Perception of Directed Acyclic Graphs: A Study on Influencing Factors
While visual comparison of directed acyclic graphs (DAGs) is commonly
encountered in various disciplines (e.g., finance, biology), knowledge about
humans' perception of graph similarity is currently quite limited. By graph
similarity perception we mean how humans perceive commonalities and differences
in graphs and herewith come to a similarity judgment. As a step toward filling
this gap the study reported in this paper strives to identify factors which
influence the similarity perception of DAGs. In particular, we conducted a
card-sorting study employing a qualitative and quantitative analysis approach
to identify 1) groups of DAGs that are perceived as similar by the participants
and 2) the reasons behind their choice of groups. Our results suggest that
similarity is mainly influenced by the number of levels, the number of nodes on
a level, and the overall shape of the graph.Comment: Graph Drawing 2017 - arXiv Version; Keywords: Graphs, Perception,
Similarity, Comparison, Visualizatio
Non-simplifying Graph Rewriting Termination
So far, a very large amount of work in Natural Language Processing (NLP) rely
on trees as the core mathematical structure to represent linguistic
informations (e.g. in Chomsky's work). However, some linguistic phenomena do
not cope properly with trees. In a former paper, we showed the benefit of
encoding linguistic structures by graphs and of using graph rewriting rules to
compute on those structures. Justified by some linguistic considerations, graph
rewriting is characterized by two features: first, there is no node creation
along computations and second, there are non-local edge modifications. Under
these hypotheses, we show that uniform termination is undecidable and that
non-uniform termination is decidable. We describe two termination techniques
based on weights and we give complexity bound on the derivation length for
these rewriting system.Comment: In Proceedings TERMGRAPH 2013, arXiv:1302.599
Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond
The problem of String Matching to Labeled Graphs (SMLG) asks to find all the paths in a labeled graph G = (V, E) whose spellings match that of an input string S ? ?^m. SMLG can be solved in quadratic O(m|E|) time [Amir et al., JALG 2000], which was proven to be optimal by a recent lower bound conditioned on SETH [Equi et al., ICALP 2019]. The lower bound states that no strongly subquadratic time algorithm exists, even if restricted to directed acyclic graphs (DAGs).
In this work we present the first parameterized algorithms for SMLG on DAGs. Our parameters capture the topological structure of G. All our results are derived from a generalization of the Knuth-Morris-Pratt algorithm [Park and Kim, CPM 1995] optimized to work in time proportional to the number of prefix-incomparable matches.
To obtain the parameterization in the topological structure of G, we first study a special class of DAGs called funnels [Millani et al., JCO 2020] and generalize them to k-funnels and the class ST_k. We present several novel characterizations and algorithmic contributions on both funnels and their generalizations
Survey on Instruction Selection: An Extensive and Modern Literature Review
Instruction selection is one of three optimisation problems involved in the
code generator backend of a compiler. The instruction selector is responsible
of transforming an input program from its target-independent representation
into a target-specific form by making best use of the available machine
instructions. Hence instruction selection is a crucial part of efficient code
generation.
Despite on-going research since the late 1960s, the last, comprehensive
survey on the field was written more than 30 years ago. As new approaches and
techniques have appeared since its publication, this brings forth a need for a
new, up-to-date review of the current body of literature. This report addresses
that need by performing an extensive review and categorisation of existing
research. The report therefore supersedes and extends the previous surveys, and
also attempts to identify where future research should be directed.Comment: Major changes: - Merged simulation chapter with macro expansion
chapter - Addressed misunderstandings of several approaches - Completely
rewrote many parts of the chapters; strengthened the discussion of many
approaches - Revised the drawing of all trees and graphs to put the root at
the top instead of at the bottom - Added appendix for listing the approaches
in a table See doc for more inf
On Quasi-Interpretations, Blind Abstractions and Implicit Complexity
Quasi-interpretations are a technique to guarantee complexity bounds on
first-order functional programs: with termination orderings they give in
particular a sufficient condition for a program to be executable in polynomial
time, called here the P-criterion. We study properties of the programs
satisfying the P-criterion, in order to better understand its intensional
expressive power. Given a program on binary lists, its blind abstraction is the
nondeterministic program obtained by replacing lists by their lengths (natural
numbers). A program is blindly polynomial if its blind abstraction terminates
in polynomial time. We show that all programs satisfying a variant of the
P-criterion are in fact blindly polynomial. Then we give two extensions of the
P-criterion: one by relaxing the termination ordering condition, and the other
one (the bounded value property) giving a necessary and sufficient condition
for a program to be polynomial time executable, with memoisation.Comment: 18 page
PReaCH: A Fast Lightweight Reachability Index using Pruning and Contraction Hierarchies
We develop the data structure PReaCH (for Pruned Reachability Contraction
Hierarchies) which supports reachability queries in a directed graph, i.e., it
supports queries that ask whether two nodes in the graph are connected by a
directed path. PReaCH adapts the contraction hierarchy speedup techniques for
shortest path queries to the reachability setting. The resulting approach is
surprisingly simple and guarantees linear space and near linear preprocessing
time. Orthogonally to that, we improve existing pruning techniques for the
search by gathering more information from a single DFS-traversal of the graph.
PReaCH-indices significantly outperform previous data structures with
comparable preprocessing cost. Methods with faster queries need significantly
more preprocessing time in particular for the most difficult instances
- …