949,870 research outputs found
Generalised Pattern Matching Revisited
In the problem of Generalised Pattern Matching (GPM) [STOC'94, Muthukrishnan and Palem], we are given a text T of length n over an alphabet Σ_T, a pattern P of length m over an alphabet Σ_P, and a matching relationship ⊆ Σ_T × Σ_P, and must return all substrings of T that match P (reporting) or the number of mismatches between each substring of T of length m and P (counting). In this work, we improve over all previously known algorithms for this problem:
- For ? being the maximum number of characters that match a fixed character, we show two new Monte Carlo algorithms, a reporting algorithm with time ?(? n log n log m) and a (1-ε)-approximation counting algorithm with time ?(ε^-1 ? n log n log m). We then derive a (1-ε)-approximation deterministic counting algorithm for GPM with ?(ε^-2 ? n log⁶ n) time.
- For ? being the number of pairs of matching characters, we demonstrate Monte Carlo algorithms for reporting and (1-ε)-approximate counting with running time ?(√? n log m √{log n}) and ?(√{ε^-1 ?} n log m √{log n}), respectively, as well as a (1-ε)-approximation deterministic algorithm for the counting variant of GPM with ?(ε^-1 √{?} n log^{7/2} n) time.
- Finally, for ℐ being the total number of disjoint intervals of characters that match the m characters of the pattern P, we show that both the reporting and the counting variants of GPM can be solved exactly and deterministically in ?(n√{ℐ log m} +n log n) time.
At the heart of our new deterministic upper bounds for ? and ? lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest.
To conclude, we demonstrate first lower bounds for GPM. We start by showing that any deterministic or Monte Carlo algorithm for GPM must use Ω(?) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed
Approximating Approximate Pattern Matching
Given a text of length and a pattern of length , the
approximate pattern matching problem asks for computation of a particular
\emph{distance} function between and every -substring of . We
consider a multiplicative approximation variant of this
problem, for distance function. In this paper, we describe two
-approximate algorithms with a runtime of
for all (constant) non-negative values
of . For constant we show a deterministic
-approximation algorithm. Previously, such run time was known
only for the case of distance, by Gawrychowski and Uzna\'nski [ICALP
2018] and only with a randomized algorithm. For constant we
show a randomized algorithm for the , thereby providing a smooth
tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for
Hamming distance (case of ) and of Gawrychowski and Uzna\'nski for
distance
MatchPy: A Pattern Matching Library
Pattern matching is a powerful tool for symbolic computations, based on the
well-defined theory of term rewriting systems. Application domains include
algebraic expressions, abstract syntax trees, and XML and JSON data.
Unfortunately, no lightweight implementation of pattern matching as general and
flexible as Mathematica exists for Python Mathics,MacroPy,patterns,PyPatt.
Therefore, we created the open source module MatchPy which offers similar
pattern matching functionality in Python using a novel algorithm which finds
matches for large pattern sets more efficiently by exploiting similarities
between patterns.Comment: arXiv admin note: substantial text overlap with arXiv:1710.0007
Efficient Online Timed Pattern Matching by Automata-Based Skipping
The timed pattern matching problem is an actively studied topic because of
its relevance in monitoring of real-time systems. There one is given a log
and a specification (given by a timed word and a timed automaton
in this paper), and one wishes to return the set of intervals for which the log
, when restricted to the interval, satisfies the specification
. In our previous work we presented an efficient timed pattern
matching algorithm: it adopts a skipping mechanism inspired by the classic
Boyer--Moore (BM) string matching algorithm. In this work we tackle the problem
of online timed pattern matching, towards embedded applications where it is
vital to process a vast amount of incoming data in a timely manner.
Specifically, we start with the Franek-Jennings-Smyth (FJS) string matching
algorithm---a recent variant of the BM algorithm---and extend it to timed
pattern matching. Our experiments indicate the efficiency of our FJS-type
algorithm in online and offline timed pattern matching
A Boyer-Moore Type Algorithm for Timed Pattern Matching
The timed pattern matching problem is formulated by Ulus et al. and has been
actively studied since, with its evident application in monitoring real-time
systems. The problem takes as input a timed word/signal and a timed pattern
(specified either by a timed regular expression or by a timed automaton); and
it returns the set of those intervals for which the given timed word, when
restricted to the interval, matches the given pattern. We contribute a
Boyer-Moore type optimization in timed pattern matching, relying on the classic
Boyer-Moore string matching algorithm and its extension to (untimed) pattern
matching by Watson and Watson. We assess its effect through experiments; for
some problem instances our Boyer-Moore type optimization achieves speed-up by
two times, indicating its potential in real-world monitoring tasks where data
sets tend to be massive
- …
