12,559 research outputs found
Real-time Regular Expression Matching
This paper is devoted to finite state automata, regular expression matching,
pattern recognition, and the exponential blow-up problem, which is the growing
complexity of automata exponentially depending on regular expression length.
This paper presents a theoretical and hardware solution to the exponential
blow-up problem for some complicated classes of regular languages, which caused
severe limitations in Network Intrusion Detection Systems work. The article
supports the solution with theorems on correctness and complexity.Comment: 17 pages, 11 figure
Linear-time Minimization of Wheeler DFAs
Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of compressed data structures: as opposed to general automata, WDFAs can be stored in just log s + O(1) bits per edge, s being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization. When the input A is a general deterministic finite-state automaton (DFA), the state-of-the-art is represented by the classic Hopcroft's algorithm, which runs in O(vertical bar A vertical bar log vertical bar A vertical bar) time. This algorithm stands at the core of the only existing minimization algorithm for Wheeler DFAs, which inherits its complexity. In this work, we show that the minimum WDFA equivalent to a given input WDFA can be computed in linear O(vertical bar A vertical bar) time. When run on de Bruijn WDFAs built from real DNA datasets, an implementation of our algorithm reduces the number of nodes from 14% to 51% at a speed of more than 1 million nodes per second.Peer reviewe
Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching
Automata play important roles in wide area of computing and the growth of
multicores calls for their efficient parallel implementation. Though it is
known in theory that we can perform the computation of a finite automaton in
parallel by simulating transitions, its implementation has a large overhead due
to the simulation. In this paper we propose a new automaton called simultaneous
finite automaton (SFA) for efficient parallel computation of an automaton. The
key idea is to extend an automaton so that it involves the simulation of
transitions. Since an SFA itself has a good property of parallelism, we can
develop easily a parallel implementation without overheads. We have implemented
a regular expression matcher based on SFA, and it has achieved over 10-times
speedups on an environment with dual hexa-core CPUs in a typical case.Comment: This paper has been accepted at the following conference: 2013
International Conference on Parallel Processing (ICPP- 2013), October 1-4,
2013 Ecole Normale Suprieure de Lyon, Lyon, Franc
Deterministic Automata for Unordered Trees
Automata for unordered unranked trees are relevant for defining schemas and
queries for data trees in Json or Xml format. While the existing notions are
well-investigated concerning expressiveness, they all lack a proper notion of
determinism, which makes it difficult to distinguish subclasses of automata for
which problems such as inclusion, equivalence, and minimization can be solved
efficiently. In this paper, we propose and investigate different notions of
"horizontal determinism", starting from automata for unranked trees in which
the horizontal evaluation is performed by finite state automata. We show that a
restriction to confluent horizontal evaluation leads to polynomial-time
emptiness and universality, but still suffers from coNP-completeness of the
emptiness of binary intersections. Finally, efficient algorithms can be
obtained by imposing an order of horizontal evaluation globally for all
automata in the class. Depending on the choice of the order, we obtain
different classes of automata, each of which has the same expressiveness as
CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556
Efficient Multistriding of Large Non-deterministic Finite State Automata for Deep Packet Inspection
Multistride automata speed up input matching because each multistriding transformation halves the size of the input string, leading to a potential 2x speedup. However, up to now little effort has been spent in optimizing the building process of multistride automata, with the result that current algorithms cannot be applied to real-life, large automata such as the ones used in commercial IDSs, because the time and the memory space needed to create the new automaton quickly becomes unfeasible. In this paper, new algorithms for efficient building of multistride NFAs for packet inspection are presented, explaining how these new techniques can outperform the previous algorithms in terms of required time and memory usag
Efficient Online Timed Pattern Matching by Automata-Based Skipping
The timed pattern matching problem is an actively studied topic because of
its relevance in monitoring of real-time systems. There one is given a log
and a specification (given by a timed word and a timed automaton
in this paper), and one wishes to return the set of intervals for which the log
, when restricted to the interval, satisfies the specification
. In our previous work we presented an efficient timed pattern
matching algorithm: it adopts a skipping mechanism inspired by the classic
Boyer--Moore (BM) string matching algorithm. In this work we tackle the problem
of online timed pattern matching, towards embedded applications where it is
vital to process a vast amount of incoming data in a timely manner.
Specifically, we start with the Franek-Jennings-Smyth (FJS) string matching
algorithm---a recent variant of the BM algorithm---and extend it to timed
pattern matching. Our experiments indicate the efficiency of our FJS-type
algorithm in online and offline timed pattern matching
Finite Countermodel Based Verification for Program Transformation (A Case Study)
Both automatic program verification and program transformation are based on
program analysis. In the past decade a number of approaches using various
automatic general-purpose program transformation techniques (partial deduction,
specialization, supercompilation) for verification of unreachability properties
of computing systems were introduced and demonstrated. On the other hand, the
semantics based unfold-fold program transformation methods pose themselves
diverse kinds of reachability tasks and try to solve them, aiming at improving
the semantics tree of the program being transformed. That means some
general-purpose verification methods may be used for strengthening program
transformation techniques. This paper considers the question how finite
countermodels for safety verification method might be used in Turchin's
supercompilation method. We extract a number of supercompilation sub-algorithms
trying to solve reachability problems and demonstrate use of an external
countermodel finder for solving some of the problems.Comment: In Proceedings VPT 2015, arXiv:1512.0221
- …