1 research outputs found
Efficient Matching with Memoization for Regexes with Look-around and Atomic Grouping (Extended Version)
Regular expression (regex) matching is fundamental in many applications,
especially in web services. However, matching by backtracking -- preferred by
most real-world implementations for its practical performance and backward
compatibility -- can suffer from so-called catastrophic backtracking, which
makes the number of backtracking super-linear and leads to the well-known ReDoS
vulnerability. Inspired by a recent algorithm by Davis et al. that runs in
linear time for (non-extended) regexes, we study efficient backtracking
matching for regexes with two common extensions, namely look-around and atomic
grouping. We present linear-time backtracking matching algorithms for these
extended regexes. Their efficiency relies on memoization, much like the one by
Davis et al.; we also strive for smaller memoization tables by carefully
trimming their range. Our experiments -- we used some real-world regexes with
the aforementioned extensions -- confirm the performance advantage of our
algorithms.Comment: To appear in ESOP 202