A Randomized Algorithm for Approximate String Matching

Abstract

We give a randomized algorithm in deterministic time O(N log M ) for estimating the score vector of matches between a text string of length N and a pattern string of length M , i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm bases on convolution to find an estimator of the scores and can be viewed as a randomization of an algorithm by Fischer and Paterson. The variance of our estimator is particularly small for scores that are close to M , i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, "never match" and "always match" symbols, to the weighted case and to higher dimensions. Keywords: convolution, FFT, approximate string matching, randomized..

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 22/10/2014

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.