11 research outputs found
Efficient exact pattern-matching in proteomic sequences
This paper proposes a novel algorithm for complete exact pattern-matching focusing the specificities of protein sequences (alphabet of 20 symbols) but, also highly efficient considering larger alphabets. The searching strategy uses large search windows allowing multiple alignments per iteration. A new filtering heuristic, named compatibility rule, contributed decisively to the efficiency improvement. The new algorithm’s performance is, on average, superior in comparison with its best-rated competitors
Alternative algorithms for bit-parallel string matching
Abstract. We consider bit-parallel algorithms of Boyer-Moore type for exact string matching. We introduce a two-way modification of the BNDM algorithm. If the text character aligned with the end of the pattern is a mismatch, we continue by examining text characters after the alignment. Besides this two-way variation, we present a simplified version of BNDM without prefix search and an algorithm scheme for long patterns. We also study a different bit-parallel algorithm, which keeps the history of examined characters in a bit-vector and where shifting is based on this bit-vector. We report experiments where we compared the new algorithms with existing ones. The simplified BNDM is the most promising of the new algorithms in practice.
A Simple Fast Hybrid Pattern-Matching Algorithm
The Knuth-Morris-Pratt (KMP) pattern-matching algorithm guarantees both independence from alphabet size and worst-case execution time linear in the pattern length; on the other hand, the Boyer-Moore (BM) algorithm provides near-optimal average-case and best-case behaviour, as well as executing very fast in practice. We describe a simple algorithm that employs the main ideas of KMP and BM (with a little help from Sunday) in an effort to combine these desirable features. Experiments indicate that in practice the new algorithm is among the fastest exact pattern-matching algorithms discovered to date, perhaps dominant for alphabet size 8 or more
Exact Analysis of Horspool’s and Sunday’s Pattern Matching Algorithms with Probabilistic Arithmetic Automata
of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several alternative constructions with different state spaces of the automata, leading to alternative time and space complexities for the computations. To our knowledge, this is the first time that suffix-based pattern matching algorithms are analyzed exactly. We present (perhaps surprising) exemplary results on short patterns and moderate text lengths. Our results easily generalize to any search-window based pattern matching algorithm. Abstract. We define deterministic arithmetic automata (DAAs) and connect them to a framework called probabilistic arithmetic automata (PAAs) [9]. We use DAAs and PAAs to compute the entire exact probability distribution (in contrast to, e.g., asymptotic expectation and variance) of the number X p ℓ
Practical and Optimal String Matching
Abstract. We develop a new exact bit-parallel string matching algorithm, based on the Shift-Or algorithm (Baeza-Yates & Gonnet, 1992). Assuming that the pattern representation fits into a single computer word, this algorithm has optimal O(n log σ m/m) average running time, as well as optimal O(n) worst case running time, where n, m and σ are the sizes of the text, the pattern, and the alphabet, respectively. We also study several implementation details. The experimental results show that our algorithm is the fastest in most of the cases where it can be applied, displacing even the long-standing BNDM (Navarro & Raffinot, 2000) family of algorithms. Finally, we show how to adapt our techniques for the Shift-Add algorithm (Baeza-Yates & Gonnet, 1992), obtaining optimal time for searching under Hamming distance.