1,920 research outputs found
Approximating Approximate Pattern Matching
Given a text of length and a pattern of length , the
approximate pattern matching problem asks for computation of a particular
\emph{distance} function between and every -substring of . We
consider a multiplicative approximation variant of this
problem, for distance function. In this paper, we describe two
-approximate algorithms with a runtime of
for all (constant) non-negative values
of . For constant we show a deterministic
-approximation algorithm. Previously, such run time was known
only for the case of distance, by Gawrychowski and Uzna\'nski [ICALP
2018] and only with a randomized algorithm. For constant we
show a randomized algorithm for the , thereby providing a smooth
tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for
Hamming distance (case of ) and of Gawrychowski and Uzna\'nski for
distance
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
Generalised Pattern Matching Revisited
In the problem of
[STOC'94, Muthukrishnan and Palem], we are given a text of length over
an alphabet , a pattern of length over an alphabet
, and a matching relationship ,
and must return all substrings of that match (reporting) or the number
of mismatches between each substring of of length and (counting).
In this work, we improve over all previously known algorithms for this problem
for various parameters describing the input instance:
* being the maximum number of characters that match a fixed
character,
* being the number of pairs of matching characters,
* being the total number of disjoint intervals of characters
that match the characters of the pattern .
At the heart of our new deterministic upper bounds for and
lies a faster construction of superimposed codes, which solves
an open problem posed in [FOCS'97, Indyk] and can be of independent interest.
To conclude, we demonstrate first lower bounds for . We start by
showing that any deterministic or Monte Carlo algorithm for must
use time, and then proceed to show higher lower bounds
for combinatorial algorithms. These bounds show that our algorithms are almost
optimal, unless a radically new approach is developed
Approximating Approximate Pattern Matching
Given a text T of length n and a pattern P of length m, the approximate pattern matching problem asks for computation of a particular distance function between P and every m-substring of T. We consider a (1 +/- epsilon) multiplicative approximation variant of this problem, for l_p distance function. In this paper, we describe two (1+epsilon)-approximate algorithms with a runtime of O~(n/epsilon) for all (constant) non-negative values of p. For constant p >= 1 we show a deterministic (1+epsilon)-approximation algorithm. Previously, such run time was known only for the case of l_1 distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant 0 <= p <= 1 we show a randomized algorithm for the l_p, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS 2015, SOSA 2018] for Hamming distance (case of p=0) and of Gawrychowski and Uznanski for l_1 distance
Linear pattern matching on sparse suffix trees
Packing several characters into one computer word is a simple and natural way
to compress the representation of a string and to speed up its processing.
Exploiting this idea, we propose an index for a packed string, based on a {\em
sparse suffix tree} \cite{KU-96} with appropriately defined suffix links.
Assuming, under the standard unit-cost RAM model, that a word can store up to
characters ( the alphabet size), our index takes
space, i.e. the same space as the packed string itself.
The resulting pattern matching algorithm runs in time ,
where is the length of the pattern, is the actual number of characters
stored in a word and is the number of pattern occurrences
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
- …