Search CORE

2,122 research outputs found

Bit-parallel search algorithms for long patterns

Author: A. Hume
A.C.-C. Yao
G. Navarro
G. Navarro
G. Zhang
H. Peltola
J. Tarhio
K. Fredriksson
L. He
M. Crochemore
M.O. Külekci
R.N. Horspool
T. Lecroq
Publication venue
Publication date: 01/01/2010
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Principles and Implementation of Deductive Parsing

Author: Pereira Fernando C. N.
Schabes Yves
Shieber Stuart M.
Publication venue
Publication date: 01/01/1994
Field of study

We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

arXiv.org e-Print Archive

CiteSeerX

Improved algorithms for string searching problems

Author: Salmela Leena
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

We present improved practically efficient algorithms for several string searching problems, where we search for a short string called the pattern in a longer string called the text. We are mainly interested in the online problem, where the text is not preprocessed, but we also present a light indexing approach to speed up exact searching of a single pattern. The new algorithms can be applied e.g. to many problems in bioinformatics and other content scanning and filtering problems. In addition to exact string matching, we develop algorithms for several other variations of the string matching problem. We study algorithms for approximate string matching, where a limited number of errors is allowed in the occurrences of the pattern, and parameterized string matching, where a substring of the text matches the pattern if the characters of the substring can be renamed in such a way that the renamed substring matches the pattern exactly. We also consider searching multiple patterns simultaneously and searching weighted patterns, where the weight of a character at a given position reflects the probability of that character occurring at that position. Many of the new algorithms use the backward matching principle, where the characters of the text that are aligned with the pattern are read backward, i.e. from right to left. Another common characteristic of the new algorithms is the use of q-grams, i.e. q consecutive characters are handled as a single character. Many of the new algorithms are bit parallel, i.e. they pack several variables to a single computer word and update all these variables with a single instruction. We show that the q-gram backward string matching algorithms that solve the exact, approximate, or multiple string matching problems are optimal on average. We also show that the q-gram backward string matching algorithm for the parameterized string matching problem is sublinear on average for a class of moderately repetitive patterns. All the presented algorithms are also shown to be fast in practice when compared to earlier algorithms. We also propose an alphabet sampling technique to speed up exact string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. String matching is then performed on this reduced subsequence and the found matches are verified in the original text. We show how to choose the sampled alphabet optimally and show that the technique speeds up string matching especially for moderate to long patterns

Efficient exact pattern-matching in proteomic sequences

Author: B. Smyth
D.E. Knuth
D.M. Sunday
F. Franek
G. Navarro
H. Peltola
M. Crochemore
M. Crochemore
P.D. Michailidis
R.A. Baeza-Yates
R.M. Karp
R.N. Horspool
R.S. Boyer
T. Lecroq
T. Lecroq
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper proposes a novel algorithm for complete exact pattern-matching focusing the specificities of protein sequences (alphabet of 20 symbols) but, also highly efficient considering larger alphabets. The searching strategy uses large search windows allowing multiple alignments per iteration. A new filtering heuristic, named compatibility rule, contributed decisively to the efficiency improvement. The new algorithm’s performance is, on average, superior in comparison with its best-rated competitors

CiteSeerX

Biblioteca Digital do IPB