Search CORE

251,714 research outputs found

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

Regular Expression Matching and Operational Semantics

Author: Alan P. Sexton
Andrew Appel
Asiri Rathnayake
Hayo Thielecke
John C. Reynolds
Josh Berdine
Ken Thompson
M.A. Reniers
Matthias Felleisen
P. Sobocinski
Peter J. Landin
Robin Milner
Robin Milner
Robin Milner
Stanley Tzeng
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2011
Field of study

Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it were a program with some non-deterministic constructs such as the Kleene star. We formalize this implementation technique for regular expression matching using operational semantics. Specifically, we derive a series of abstract machines, moving from the abstract definition of matching to increasingly realistic machines. First a continuation is added to the operational semantics to describe what remains to be matched after the current expression. Next, we represent the expression as a data structure using pointers, which enables redundant searches to be eliminated via testing for pointer equality. From there, we arrive both at Thompson's lockstep construction and a machine that performs some operations in parallel, suitable for implementation on a large number of cores, such as a GPU. We formalize the parallel machine using process algebra and report some preliminary experiments with an implementation on a graphics processor using CUDA.Comment: In Proceedings SOS 2011, arXiv:1108.279

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

From Regular Expression Matching to Parsing

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 29/01/2019
Field of study

Given a regular expression

R

and a string

Q

, the regular expression parsing problem is to determine if

Q

matches

R

and if so, determine how it matches, e.g., by a mapping of the characters of

Q

to the characters in

R

. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string

Q

matches regular expression

R

into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing

arXiv.org e-Print Archive

Online Research Database In Technology

Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

Author: Berglund Martin
Drewes Frank
van der Merwe Brink
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

We develop a formal perspective on how regular expression matching works in Java, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals

Sparse Regular Expression Matching

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 10/07/2019
Field of study

We present the first algorithm for regular expression matching that can take advantage of sparsity in the input instance. Our main result is a new algorithm that solves regular expression matching in

O\left(\Delta \log \log \frac{nm}{\Delta} + n + m\right)

time, where

m

is the number of positions in the regular expression,

n

is the length of the string, and

\Delta

is the \emph{density} of the instance, defined as the total number of active states in a simulation of the position automaton. This measure is a lower bound on the total number of active states in simulations of all classic polynomial sized finite automata. Our bound improves the best known bounds for regular expression matching by almost a linear factor in the density of the problem. The key component in the result is a novel linear space representation of the position automaton that supports state-set transition computation in near-linear time in the size of the input and output state sets

arXiv.org e-Print Archive

Real-time Regular Expression Matching

Author: Bernadotte Alexandra
Publication venue
Publication date: 20/08/2023
Field of study

This paper is devoted to finite state automata, regular expression matching, pattern recognition, and the exponential blow-up problem, which is the growing complexity of automata exponentially depending on regular expression length. This paper presents a theoretical and hardware solution to the exponential blow-up problem for some complicated classes of regular languages, which caused severe limitations in Network Intrusion Detection Systems work. The article supports the solution with theorems on correctness and complexity.Comment: 17 pages, 11 figure

arXiv.org e-Print Archive

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology

Which Regular Expression Patterns are Hard to Match?

Author: Backurs Arturs
Indyk Piotr
Publication venue
Publication date: 26/09/2016
Field of study

Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an

O(mn)

running time (where

m

is the length of the pattern and

n

is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH. An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g.,

[a|ab|bc]^*

. This corresponds to the so-called {\em word break} problem, for which a dynamic programming algorithm with a runtime of (roughly)

O(n\sqrt{m})

is known. We show that the latter bound is not tight and improve the runtime to

O(nm^{0.44\ldots})

arXiv.org e-Print Archive

DSpace@MIT

Crossref