Search CORE

3,501 research outputs found

Regular Expression Matching and Operational Semantics

Author: Alan P. Sexton
Andrew Appel
Asiri Rathnayake
Hayo Thielecke
John C. Reynolds
Josh Berdine
Ken Thompson
M.A. Reniers
Matthias Felleisen
P. Sobocinski
Peter J. Landin
Robin Milner
Robin Milner
Robin Milner
Stanley Tzeng
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2011
Field of study

Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it were a program with some non-deterministic constructs such as the Kleene star. We formalize this implementation technique for regular expression matching using operational semantics. Specifically, we derive a series of abstract machines, moving from the abstract definition of matching to increasingly realistic machines. First a continuation is added to the operational semantics to describe what remains to be matched after the current expression. Next, we represent the expression as a data structure using pointers, which enables redundant searches to be eliminated via testing for pointer equality. From there, we arrive both at Thompson's lockstep construction and a machine that performs some operations in parallel, suitable for implementation on a large number of cores, such as a GPU. We formalize the parallel machine using process algebra and report some preliminary experiments with an implementation on a graphics processor using CUDA.Comment: In Proceedings SOS 2011, arXiv:1108.279

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Regular Expression Search on Compressed Text

Author: Ganty Pierre
Valero Pedro
Publication venue
Publication date: 01/01/2019
Field of study

We present an algorithm for searching regular expression matches in compressed text. The algorithm reports the number of matching lines in the uncompressed text in time linear in the size of its compressed version. We define efficient data structures that yield nearly optimal complexity bounds and provide a sequential implementation --zearch-- that requires up to 25% less time than the state of the art.Comment: 10 pages, published in Data Compression Conference (DCC'19

arXiv.org e-Print Archive

Crossref

Archivo Digital UPM

Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching

Author: Matsuzaki Kiminori
Sassa Masataka
Sin'ya Ryoma
Publication venue
Publication date: 01/01/2013
Field of study

Automata play important roles in wide area of computing and the growth of multicores calls for their efficient parallel implementation. Though it is known in theory that we can perform the computation of a finite automaton in parallel by simulating transitions, its implementation has a large overhead due to the simulation. In this paper we propose a new automaton called simultaneous finite automaton (SFA) for efficient parallel computation of an automaton. The key idea is to extend an automaton so that it involves the simulation of transitions. Since an SFA itself has a good property of parallelism, we can develop easily a parallel implementation without overheads. We have implemented a regular expression matcher based on SFA, and it has achieved over 10-times speedups on an environment with dual hexa-core CPUs in a typical case.Comment: This paper has been accepted at the following conference: 2013 International Conference on Parallel Processing (ICPP- 2013), October 1-4, 2013 Ecole Normale Suprieure de Lyon, Lyon, Franc

arXiv.org e-Print Archive

Crossref

Kochi University of Technology Academic Resource Repository

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Online Research Database In Technology

Strengthening measurements from the edges: application-level packet loss rate estimation

Author: Carlson R.
Dischinger M.
Juan Carlos De Martin
Michela Meo
Simone Basso
Sänchez M. A.
Publication venue: ACM New York, NY, USA
Publication date: 01/01/2013
Field of study

Network users know much less than ISPs, Internet exchanges and content providers about what happens inside the network. Consequently users cannot either easily detect network neutrality violations or readily exercise their market power by knowledgeably switching ISPs. This paper contributes to the ongoing efforts to empower users by proposing two models to estimate -- via application-level measurements -- a key network indicator, i.e., the packet loss rate (PLR) experienced by FTP-like TCP downloads. Controlled, testbed, and large-scale experiments show that the Inverse Mathis model is simpler and more consistent across the whole PLR range, but less accurate than the more advanced Likely Rexmit model for landline connections and moderate PL

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity

Author: Gruber Hermann
Holzer Markus
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

The equivalence of finite automata and regular expressions dates back to the seminal paper of Kleene on events in nerve nets and finite automata from 1956. In the present paper we tour a fragment of the literature and summarize results on upper and lower bounds on the conversion of finite automata to regular expressions and vice versa. We also briefly recall the known bounds for the removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free nondeterministic devices. Moreover, we report on recent results on the average case descriptional complexity bounds for the conversion of regular expressions to finite automata and brand new developments on the state elimination algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals