In the problem of Generalised Pattern Matching(GPM)
[STOC'94, Muthukrishnan and Palem], we are given a text T of length n over
an alphabet ΣT, a pattern P of length m over an alphabet
ΣP, and a matching relationship ⊆ΣT×ΣP,
and must return all substrings of T that match P (reporting) or the number
of mismatches between each substring of T of length m and P (counting).
In this work, we improve over all previously known algorithms for this problem
for various parameters describing the input instance:
* D being the maximum number of characters that match a fixed
character,
* S being the number of pairs of matching characters,
* I being the total number of disjoint intervals of characters
that match the m characters of the pattern P.
At the heart of our new deterministic upper bounds for D and
S lies a faster construction of superimposed codes, which solves
an open problem posed in [FOCS'97, Indyk] and can be of independent interest.
To conclude, we demonstrate first lower bounds for GPM. We start by
showing that any deterministic or Monte Carlo algorithm for GPM must
use Ω(S) time, and then proceed to show higher lower bounds
for combinatorial algorithms. These bounds show that our algorithms are almost
optimal, unless a radically new approach is developed