53,927 research outputs found
Fast and Compact Regular Expression Matching
We study 4 problems in string matching, namely, regular expression matching,
approximate regular expression matching, string edit distance, and subsequence
indexing, on a standard word RAM model of computation that allows
logarithmic-sized words to be manipulated in constant time. We show how to
improve the space and/or remove a dependency on the alphabet size for each
problem using either an improved tabulation technique of an existing algorithm
or by combining known algorithms in a new way
Optimising Unicode Regular Expression Evaluation with Previews
The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user-selectable character encodings. An established design approach was used: a simulated non-deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches, and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched
Fast Searching in Packed Strings
Given strings and the (exact) string matching problem is to find all
positions of substrings in matching . The classical Knuth-Morris-Pratt
algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear
time which is optimal if we can only read one character at the time. However,
most strings are stored in a computer in a packed representation with several
characters in a single word, giving us the opportunity to read multiple
characters simultaneously. In this paper we study the worst-case complexity of
string matching on strings given in packed representation. Let be
the lengths and , respectively, and let denote the size of the
alphabet. On a standard unit-cost word-RAM with logarithmic word size we
present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m +
\occ\right). Here \occ is the number of occurrences of in . For this improves the bound of the Knuth-Morris-Pratt algorithm.
Furthermore, if our algorithm is optimal since any
algorithm must spend at least \Omega(\frac{(n+m)\log
\sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to
read the input and report all occurrences. The result is obtained by a novel
automaton construction based on the Knuth-Morris-Pratt algorithm combined with
a new compact representation of subautomata allowing an optimal
tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM
200
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
Semiparametric estimation of shifts on compact Lie groups for image registration
In this paper we focus on estimating the deformations that may exist between similar images in the presence of additive noise when a reference template is unknown. The deformations aremodeled as parameters lying in a finite dimensional compact Lie group. A generalmatching criterion based on the Fourier transformand itswell known shift property on compact Lie groups is introduced. M-estimation and semiparametric theory are then used to study the consistency and asymptotic normality of the resulting estimators. As Lie groups are typically nonlinear spaces, our tools rely on statistical estimation for parameters lying in a manifold and take into account the geometrical aspects of the problem. Some simulations are used to illustrate the usefulness of our approach and applications to various areas in image processing are discussed
- âŠ