2 research outputs found
Breaking the Variance: Approximating the Hamming Distance in Time Per Alignment
The algorithmic tasks of computing the Hamming distance between a given
pattern of length and each location in a text of length is one of the
most fundamental algorithmic tasks in string algorithms. Unfortunately, there
is evidence that for a text of size and a pattern of size , one
cannot compute the exact Hamming distance for all locations in in time
which is less than . However, Karloff~\cite{karloff} showed
that if one is willing to suffer a approximation, then it is
possible to solve the problem with high probability, in time.
Due to related lower bounds for computing the Hamming distance of two strings
in the one-way communication complexity model, it is strongly believed that
obtaining an algorithm for solving the approximation version cannot be done
much faster as a function of . We show here that this belief
is false by introducing a new time algorithm
that succeeds with high probability.
The main idea behind our algorithm, which is common in sparse recovery
problems, is to reduce the variance of a specific randomized experiment by
(approximately) separating heavy hitters from non-heavy hitters. However, while
known sparse recovery techniques work very well on vectors, they do not seem to
apply here, where we are dealing with mismatches between pairs of characters.
We introduce two main algorithmic ingredients. The first is a new sparse
recovery method that applies for pair inputs (such as in our setting). The
second is a new construction of hash/projection functions, which allows to
count the number of projections that induce mismatches between two characters
exponentially faster than brute force. We expect that these algorithmic
techniques will be of independent interest.Comment: Appeared in FOCS 201
Pattern Matching under Polynomial Transformation
We consider a class of pattern matching problems where a normalising
transformation is applied at every alignment. Normalised pattern matching plays
a key role in fields as diverse as image processing and musical information
processing where application specific transformations are often applied to the
input. By considering the class of polynomial transformations of the input, we
provide fast algorithms and the first lower bounds for both new and old
problems. Given a pattern of length m and a longer text of length n where both
are assumed to contain integer values only, we first show O(n log m) time
algorithms for pattern matching under linear transformations even when wildcard
symbols can occur in the input. We then show how to extend the technique to
polynomial transformations of arbitrary degree. Next we consider the problem of
finding the minimum Hamming distance under polynomial transformation. We show
that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time
algorithm for additive and linear transformations conditional on the hardness
of the classic 3SUM problem. Finally, we consider a version of the Hamming
distance problem under additive transformations with a bound k on the maximum
distance that need be reported. We give a deterministic O(nk log k) time
solution which we then improve by careful use of randomisation to O(n sqrt(k
log k) log n) time for sufficiently small k. Our randomised solution outputs
the correct answer at every position with high probability.Comment: 27 page