45 research outputs found
The k-mismatch problem revisited
We revisit the complexity of one of the most basic problems in pattern
matching. In the k-mismatch problem we must compute the Hamming distance
between a pattern of length m and every m-length substring of a text of length
n, as long as that Hamming distance is at most k. Where the Hamming distance is
greater than k at some alignment of the pattern and text, we simply output
"No".
We study this problem in both the standard offline setting and also as a
streaming problem. In the streaming k-mismatch problem the text arrives one
symbol at a time and we must give an output before processing any future
symbols. Our main results are as follows:
1) Our first result is a deterministic time offline algorithm for k-mismatch on a text of length n. This is a
factor of k improvement over the fastest previous result of this form from SODA
2000 by Amihood Amir et al.
2) We then give a randomised and online algorithm which runs in the same time
complexity but requires only space in total.
3) Next we give a randomised -approximation algorithm for the
streaming k-mismatch problem which uses
space and runs in worst-case time per
arriving symbol.
4) Finally we combine our new results to derive a randomised
space algorithm for the streaming k-mismatch problem
which runs in worst-case time per
arriving symbol. This improves the best previous space complexity for streaming
k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We
also improve the time complexity of this previous result by an even greater
factor to match the fastest known offline algorithm (up to logarithmic
factors)
Approximating Approximate Pattern Matching
Given a text of length and a pattern of length , the
approximate pattern matching problem asks for computation of a particular
\emph{distance} function between and every -substring of . We
consider a multiplicative approximation variant of this
problem, for distance function. In this paper, we describe two
-approximate algorithms with a runtime of
for all (constant) non-negative values
of . For constant we show a deterministic
-approximation algorithm. Previously, such run time was known
only for the case of distance, by Gawrychowski and Uzna\'nski [ICALP
2018] and only with a randomized algorithm. For constant we
show a randomized algorithm for the , thereby providing a smooth
tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for
Hamming distance (case of ) and of Gawrychowski and Uzna\'nski for
distance
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
Longest Common Extensions in Sublinear Space
The longest common extension problem (LCE problem) is to construct a data
structure for an input string of length that supports LCE
queries. Such a query returns the length of the longest common prefix of the
suffixes starting at positions and in . This classic problem has a
well-known solution that uses space and query time. In this paper
we show that for any trade-off parameter , the problem can
be solved in space and query time. This
significantly improves the previously best known time-space trade-offs, and
almost matches the best known time-space product lower bound.Comment: An extended abstract of this paper has been accepted to CPM 201
Checking whether a word is Hamming-isometric in linear time
A finite word is Hamming-isometric if for any two word and of
same length avoiding , can be transformed into by changing one by
one all the letters on which differs from , in such a way that all of
the new words obtained in this process also avoid~. Words which are not
Hamming-isometric have been characterized as words having a border with two
mismatches. We derive from this characterization a linear-time algorithm to
check whether a word is Hamming-isometric. It is based on pattern matching
algorithms with mismatches. Lee-isometric words over a four-letter alphabet
have been characterized as words having a border with two Lee-errors. We derive
from this characterization a linear-time algorithm to check whether a word over
an alphabet of size four is Lee-isometric.Comment: A second algorithm for checking whether a word is Hamming-isometric
is added using the result given in reference [5