727 research outputs found
The k-mismatch problem revisited
We revisit the complexity of one of the most basic problems in pattern
matching. In the k-mismatch problem we must compute the Hamming distance
between a pattern of length m and every m-length substring of a text of length
n, as long as that Hamming distance is at most k. Where the Hamming distance is
greater than k at some alignment of the pattern and text, we simply output
"No".
We study this problem in both the standard offline setting and also as a
streaming problem. In the streaming k-mismatch problem the text arrives one
symbol at a time and we must give an output before processing any future
symbols. Our main results are as follows:
1) Our first result is a deterministic time offline algorithm for k-mismatch on a text of length n. This is a
factor of k improvement over the fastest previous result of this form from SODA
2000 by Amihood Amir et al.
2) We then give a randomised and online algorithm which runs in the same time
complexity but requires only space in total.
3) Next we give a randomised -approximation algorithm for the
streaming k-mismatch problem which uses
space and runs in worst-case time per
arriving symbol.
4) Finally we combine our new results to derive a randomised
space algorithm for the streaming k-mismatch problem
which runs in worst-case time per
arriving symbol. This improves the best previous space complexity for streaming
k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We
also improve the time complexity of this previous result by an even greater
factor to match the fastest known offline algorithm (up to logarithmic
factors)
On Derivatives and Subpattern Orders of Countable Subshifts
We study the computational and structural aspects of countable
two-dimensional SFTs and other subshifts. Our main focus is on the topological
derivatives and subpattern posets of these objects, and our main results are
constructions of two-dimensional countable subshifts with interesting
properties. We present an SFT whose iterated derivatives are maximally complex
from the computational point of view, a sofic shift whose subpattern poset
contains an infinite descending chain, a family of SFTs whose finite subpattern
posets contain arbitrary finite posets, and a natural example of an SFT with
infinite Cantor-Bendixon rank.Comment: In Proceedings AUTOMATA&JAC 2012, arXiv:1208.249
Context unification is in PSPACE
Contexts are terms with one `hole', i.e. a place in which we can substitute
an argument. In context unification we are given an equation over terms with
variables representing contexts and ask about the satisfiability of this
equation. Context unification is a natural subvariant of second-order
unification, which is undecidable, and a generalization of word equations,
which are decidable, at the same time. It is the unique problem between those
two whose decidability is uncertain (for already almost two decades). In this
paper we show that the context unification is in PSPACE. The result holds under
a (usual) assumption that the first-order signature is finite.
This result is obtained by an extension of the recompression technique,
recently developed by the author and used in particular to obtain a new PSPACE
algorithm for satisfiability of word equations, to context unification. The
recompression is based on performing simple compression rules (replacing pairs
of neighbouring function symbols), which are (conceptually) applied on the
solution of the context equation and modifying the equation in a way so that
such compression steps can be in fact performed directly on the equation,
without the knowledge of the actual solution.Comment: 27 pages, submitted, small notation changes and small improvements
over the previous tex
Dictionary Matching with One Gap
The dictionary matching with gaps problem is to preprocess a dictionary
of gapped patterns over alphabet , where each
gapped pattern is a sequence of subpatterns separated by bounded
sequences of don't cares. Then, given a query text of length over
alphabet , the goal is to output all locations in in which a
pattern , , ends. There is a renewed current interest
in the gapped matching problem stemming from cyber security. In this paper we
solve the problem where all patterns in the dictionary have one gap with at
least and at most don't cares, where and are
given parameters. Specifically, we show that the dictionary matching with a
single gap problem can be solved in either time and
space, and query time , where is the number
of patterns found, or preprocessing time and space: , and query
time , where is the number of patterns found.
As far as we know, this is the best solution for this setting of the problem,
where many overlaps may exist in the dictionary.Comment: A preliminary version was published at CPM 201
On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree
Exact pattern matching in labeled graphs is the problem of searching paths of
a graph that spell the same string as the pattern . This
basic problem can be found at the heart of more complex operations on variation
graphs in computational biology, of query operations in graph databases, and of
analysis operations in heterogeneous networks, where the nodes of some paths
must match a sequence of labels or types. We describe a simple conditional
lower bound that, for any constant , an -time or an -time algorithm for exact pattern
matching on graphs, with node labels and patterns drawn from a binary alphabet,
cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is
false. The result holds even if restricted to undirected graphs of maximum
degree three or directed acyclic graphs of maximum sum of indegree and
outdegree three. Although a conditional lower bound of this kind can be somehow
derived from previous results (Backurs and Indyk, FOCS'16), we give a direct
reduction from SETH for dissemination purposes, as the result might interest
researchers from several areas, such as computational biology, graph database,
and graph mining, as mentioned before. Indeed, as approximate pattern matching
on graphs can be solved in time, exact and approximate matching are
thus equally hard (quadratic time) on graphs under the SETH assumption. In
comparison, the same problems restricted to strings have linear time vs
quadratic time solutions, respectively, where the latter ones have a matching
SETH lower bound on computing the edit distance of two strings (Backurs and
Indyk, STOC'15).Comment: Using Lemma 12 and Lemma 13 might to be enough to prove Lemma 14.
However, the proof of Lemma 14 is correct if you assume that the graph used
in the reduction is a DAG. Hence, since the problem is already quadratic for
a DAG and a binary alphabet, it has to be quadratic also for a general graph
and a binary alphabe
- …