2 research outputs found
Faster Compression of Deterministic Finite Automata
Deterministic finite automata (DFA) are a classic tool for high throughput
matching of regular expressions, both in theory and practice.
Due to their high space consumption, extensive research has been devoted to
compressed representations of DFAs that still support efficient pattern
matching queries.
Kumar~et~al.~[SIGCOMM 2006] introduced the \emph{delayed deterministic finite
automaton} (\ddfa{}) which exploits the large redundancy between inter-state
transitions in the automaton.
They showed it to obtain up to two orders of magnitude compression of
real-world DFAs, and their work formed the basis of numerous subsequent
results.
Their algorithm, as well as later algorithms based on their idea, have an
inherent quadratic-time bottleneck, as they consider every pair of states to
compute the optimal compression.
In this work we present a simple, general framework based on
locality-sensitive hashing for speeding up these algorithms to achieve
sub-quadratic construction times for \ddfa{}s.
We apply the framework to speed up several algorithms to near-linear time,
and experimentally evaluate their performance on real-world regular expression
sets extracted from modern intrusion detection systems.
We find an order of magnitude improvement in compression times, with either
little or no loss of compression, or even significantly better compression in
some cases