3,501 research outputs found
Regular Expression Matching and Operational Semantics
Many programming languages and tools, ranging from grep to the Java String
library, contain regular expression matchers. Rather than first translating a
regular expression into a deterministic finite automaton, such implementations
typically match the regular expression on the fly. Thus they can be seen as
virtual machines interpreting the regular expression much as if it were a
program with some non-deterministic constructs such as the Kleene star. We
formalize this implementation technique for regular expression matching using
operational semantics. Specifically, we derive a series of abstract machines,
moving from the abstract definition of matching to increasingly realistic
machines. First a continuation is added to the operational semantics to
describe what remains to be matched after the current expression. Next, we
represent the expression as a data structure using pointers, which enables
redundant searches to be eliminated via testing for pointer equality. From
there, we arrive both at Thompson's lockstep construction and a machine that
performs some operations in parallel, suitable for implementation on a large
number of cores, such as a GPU. We formalize the parallel machine using process
algebra and report some preliminary experiments with an implementation on a
graphics processor using CUDA.Comment: In Proceedings SOS 2011, arXiv:1108.279
Regular Expression Search on Compressed Text
We present an algorithm for searching regular expression matches in
compressed text. The algorithm reports the number of matching lines in the
uncompressed text in time linear in the size of its compressed version. We
define efficient data structures that yield nearly optimal complexity bounds
and provide a sequential implementation --zearch-- that requires up to 25% less
time than the state of the art.Comment: 10 pages, published in Data Compression Conference (DCC'19
Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching
Automata play important roles in wide area of computing and the growth of
multicores calls for their efficient parallel implementation. Though it is
known in theory that we can perform the computation of a finite automaton in
parallel by simulating transitions, its implementation has a large overhead due
to the simulation. In this paper we propose a new automaton called simultaneous
finite automaton (SFA) for efficient parallel computation of an automaton. The
key idea is to extend an automaton so that it involves the simulation of
transitions. Since an SFA itself has a good property of parallelism, we can
develop easily a parallel implementation without overheads. We have implemented
a regular expression matcher based on SFA, and it has achieved over 10-times
speedups on an environment with dual hexa-core CPUs in a typical case.Comment: This paper has been accepted at the following conference: 2013
International Conference on Parallel Processing (ICPP- 2013), October 1-4,
2013 Ecole Normale Suprieure de Lyon, Lyon, Franc
Subsequence Automata with Default Transitions
Let be a string of length with characters from an alphabet of size
. The \emph{subsequence automaton} of (often called the
\emph{directed acyclic subsequence graph}) is the minimal deterministic finite
automaton accepting all subsequences of . A straightforward construction
shows that the size (number of states and transitions) of the subsequence
automaton is and that this bound is asymptotically optimal.
In this paper, we consider subsequence automata with \emph{default
transitions}, that is, special transitions to be taken only if none of the
regular transitions match the current character, and which do not consume the
current character. We show that with default transitions, much smaller
subsequence automata are possible, and provide a full trade-off between the
size of the automaton and the \emph{delay}, i.e., the maximum number of
consecutive default transitions followed before consuming a character.
Specifically, given any integer parameter , , we
present a subsequence automaton with default transitions of size
and delay . Hence, with we
obtain an automaton of size and delay . On
the other extreme, with , we obtain an automaton of size and delay , thus matching the bound for the standard subsequence
automaton construction. Finally, we generalize the result to multiple strings.
The key component of our result is a novel hierarchical automata construction
of independent interest.Comment: Corrected typo
Strengthening measurements from the edges: application-level packet loss rate estimation
Network users know much less than ISPs, Internet exchanges and content providers about what happens inside the network. Consequently users cannot either easily detect network neutrality violations or readily exercise their market power by knowledgeably switching ISPs. This paper contributes to the ongoing efforts to empower users by proposing two models to estimate -- via application-level measurements -- a key network indicator, i.e., the packet loss rate (PLR) experienced by FTP-like TCP downloads. Controlled, testbed, and large-scale experiments show that the Inverse Mathis model is simpler and more consistent across the whole PLR range, but less accurate than the more advanced Likely Rexmit model for landline connections and moderate PL
Fast and Compact Regular Expression Matching
We study 4 problems in string matching, namely, regular expression matching,
approximate regular expression matching, string edit distance, and subsequence
indexing, on a standard word RAM model of computation that allows
logarithmic-sized words to be manipulated in constant time. We show how to
improve the space and/or remove a dependency on the alphabet size for each
problem using either an improved tabulation technique of an existing algorithm
or by combining known algorithms in a new way
From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity
The equivalence of finite automata and regular expressions dates back to the
seminal paper of Kleene on events in nerve nets and finite automata from 1956.
In the present paper we tour a fragment of the literature and summarize results
on upper and lower bounds on the conversion of finite automata to regular
expressions and vice versa. We also briefly recall the known bounds for the
removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free
nondeterministic devices. Moreover, we report on recent results on the average
case descriptional complexity bounds for the conversion of regular expressions
to finite automata and brand new developments on the state elimination
algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527
- …