Search CORE

949,870 research outputs found

Generalised Pattern Matching Revisited

Author: Dudek Bartłomiej
Gawrychowski Paweł
Starikovskaya Tatiana
Publication venue: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Publication date: 01/01/2020
Field of study

In the problem of Generalised Pattern Matching (GPM) [STOC'94, Muthukrishnan and Palem], we are given a text T of length n over an alphabet Σ_T, a pattern P of length m over an alphabet Σ_P, and a matching relationship ⊆ Σ_T × Σ_P, and must return all substrings of T that match P (reporting) or the number of mismatches between each substring of T of length m and P (counting). In this work, we improve over all previously known algorithms for this problem: - For ? being the maximum number of characters that match a fixed character, we show two new Monte Carlo algorithms, a reporting algorithm with time ?(? n log n log m) and a (1-ε)-approximation counting algorithm with time ?(ε^-1 ? n log n log m). We then derive a (1-ε)-approximation deterministic counting algorithm for GPM with ?(ε^-2 ? n log⁶ n) time. - For ? being the number of pairs of matching characters, we demonstrate Monte Carlo algorithms for reporting and (1-ε)-approximate counting with running time ?(√? n log m √{log n}) and ?(√{ε^-1 ?} n log m √{log n}), respectively, as well as a (1-ε)-approximation deterministic algorithm for the counting variant of GPM with ?(ε^-1 √{?} n log^{7/2} n) time. - Finally, for ℐ being the total number of disjoint intervals of characters that match the m characters of the pattern P, we show that both the reporting and the counting variants of GPM can be solved exactly and deterministically in ?(n√{ℐ log m} +n log n) time. At the heart of our new deterministic upper bounds for ? and ? lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for GPM. We start by showing that any deterministic or Monte Carlo algorithm for GPM must use Ω(?) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed

INRIA a CCSD electronic archive server

DROPS Dagstuhl Research Online Publication Server

HAL: Hyper Article en Ligne

Approximating Approximate Pattern Matching

Author: Studený Jan
Uznański Przemysław
Publication venue
Publication date: 01/01/2019
Field of study

Given a text

T

of length

n

and a pattern

P

of length

m

, the approximate pattern matching problem asks for computation of a particular \emph{distance} function between

P

and every

m

-substring of

T

. We consider a

(1\pm\varepsilon)

multiplicative approximation variant of this problem, for

\ell_p

distance function. In this paper, we describe two

(1+\varepsilon)

-approximate algorithms with a runtime of

\widetilde{O}(\frac{n}{\varepsilon})

for all (constant) non-negative values of

p

. For constant

p \ge 1

we show a deterministic

(1+\varepsilon)

-approximation algorithm. Previously, such run time was known only for the case of

\ell_1

distance, by Gawrychowski and Uzna\'nski [ICALP 2018] and only with a randomized algorithm. For constant

0 \le p \le 1

we show a randomized algorithm for the

\ell_p

, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of

p=0

) and of Gawrychowski and Uzna\'nski for

\ell_1

distance

arXiv.org e-Print Archive

Repository for Publications and Research Data

MatchPy: A Pattern Matching Library

Author: Barthels Henrik
Bientinesi Paolo
Krebber Manuel
Publication venue: 'SciPy'
Publication date: 01/01/2017
Field of study

Pattern matching is a powerful tool for symbolic computations, based on the well-defined theory of term rewriting systems. Application domains include algebraic expressions, abstract syntax trees, and XML and JSON data. Unfortunately, no lightweight implementation of pattern matching as general and flexible as Mathematica exists for Python Mathics,MacroPy,patterns,PyPatt. Therefore, we created the open source module MatchPy which offers similar pattern matching functionality in Python using a novel algorithm which finds matches for large pattern sets more efficiently by exploiting similarities between patterns.Comment: arXiv admin note: substantial text overlap with arXiv:1710.0007

arXiv.org e-Print Archive

Crossref

Efficient Online Timed Pattern Matching by Automata-Based Skipping

Author: A Kane
BW Watson
D Ničković
D Sunday
D Ulus
D Ulus
DE Knuth
DL Dill
DR Kini
E Asarin
F Franek
F Herbreteau
G Behrmann
G Behrmann
H-M Ho
M Waga
O Maler
R Alur
RS Boyer
S Chen
S Faro
T Ferrère
T Reinbacher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/06/2017
Field of study

The timed pattern matching problem is an actively studied topic because of its relevance in monitoring of real-time systems. There one is given a log

w

and a specification

\mathcal{A}

(given by a timed word and a timed automaton in this paper), and one wishes to return the set of intervals for which the log

w

, when restricted to the interval, satisfies the specification

\mathcal{A}

. In our previous work we presented an efficient timed pattern matching algorithm: it adopts a skipping mechanism inspired by the classic Boyer--Moore (BM) string matching algorithm. In this work we tackle the problem of online timed pattern matching, towards embedded applications where it is vital to process a vast amount of incoming data in a timely manner. Specifically, we start with the Franek-Jennings-Smyth (FJS) string matching algorithm---a recent variant of the BM algorithm---and extend it to timed pattern matching. Our experiments indicate the efficiency of our FJS-type algorithm in online and offline timed pattern matching

arXiv.org e-Print Archive

Crossref

A Boyer-Moore Type Algorithm for Timed Pattern Matching

Author: Akazaki Takumi
Hasuo Ichiro
Waga Masaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The timed pattern matching problem is formulated by Ulus et al. and has been actively studied since, with its evident application in monitoring real-time systems. The problem takes as input a timed word/signal and a timed pattern (specified either by a timed regular expression or by a timed automaton); and it returns the set of those intervals for which the given timed word, when restricted to the interval, matches the given pattern. We contribute a Boyer-Moore type optimization in timed pattern matching, relying on the classic Boyer-Moore string matching algorithm and its extension to (untimed) pattern matching by Watson and Watson. We assess its effect through experiments; for some problem instances our Boyer-Moore type optimization achieves speed-up by two times, indicating its potential in real-world monitoring tasks where data sets tend to be massive

arXiv.org e-Print Archive

Crossref