Search CORE

120 research outputs found

A taxonomy of sublinear multiple keyword pattern matching algorithms

Author: Watson B.W.
Zwaan G.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1995
Field of study

AbstractThis article presents a taxonomy of sublinear keyword pattern matching algorithms related to the Boyer-Moore algorithm [3] and the Commentz-Walter algorithm [5, 6]. The taxonomy includes, amongst others, the multiple keyword generalization of the single keyword Boyer-Moore algorithm and an algorithm by Fan and Su [9, 10]. The corresponding precomputation algorithms are presented as well. The taxonomy is based on the idea of ordering algorithms according to their essential problem and algorithm details, and deriving all algorithms from a common starting point by successively adding these details in a correctness preserving way. This way of presentation not only provides a complete correctness argument of each algorithm, but also makes very clear what algorithms have in common (the details of their nearest common ancestor) and where they differ (the details added after their nearest common ancestor). Introduction of the notion of safe shift distances proves to be essential in the derivation and classification of the algorithms. Moreover, the article provides a common derivation for and a uniform presentation of the precomputation algorithms, not yet found in the literature

Repository TU/e

Elsevier - Publisher Connector

Pure OAI Repository

Weak factor automata : the failure of failure factor oracles?

Author: Cleophas L.G.W.A. (Loek)
Kourie Derrick G.
Watson Bruce William
Publication venue: Computer Society of South Africa
Publication date: 01/08/2014
Field of study

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One e cient, compact representation is the factor oracle (FO). At the same time, any classical deterministic nite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 - 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.http://www.journals.co.za/ej/ejour_comp.htmlam201

Directory of Open Access Journals

Stellenbosch University SUNScholar Repository

UPSpace at the University of Pretoria

Faster subsequence recognition in compressed strings

Author: A Tiskin
A Tiskin
A. Tiskin
BW Watson
CER Alves
G Myers
G Navarro
G Ziv
G Ziv
J Kärkkäinen
JL Bentley
M Crochemore
P Cégielski
TA Welch
W Rytter
WJ Masek
Publication venue
Publication date: 18/01/2008
Field of study

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length

\bar m

, and an uncompressed pattern of length

n

, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time

O(\bar mn^2 \log n)

. We improve the running time to

O(\bar mn^{1.5})

. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time

O(\bar mn^{1.5})

; the same problem with a compressed pattern is known to be NP-hard

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Taxonomies of regular tree algorithms

Author: Cleophas L.G.W.A.
Hemerik C.
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 01/01/2009
Field of study

Algorithms for acceptance, pattern matching and parsing of regular trees and the tree automata used in these algorithms have many applications, including instruction selection in compilers, implementation of term rewriting systems, and model checking. Many such tree algorithms and constructions for such tree automata appear in the literature, but some deficiencies existed, including: inaccessibility of theory and algorithms; difficulty of comparing algorithms due to variations in presentation style and level of formality; and lack of reference to the theory in many publications. An algorithm taxonomy is an effective means of bringing order to such a field. We report on two taxonomies of regular tree algorithms that we have constructed to deal with the deficiencies. The complete work has been presented in the PhD thesis of the first author

Repository TU/e

Pure OAI Repository

Multi-user publishing in the Web : DReSS, a Document Repository Service Station

Author: Aerts A.T.M.
De Bra P.M.E.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1996
Field of study

Many WWW servers contain information written by several authors. These authors either need an account on the server machine, and special permissions to create information in the server space, or else the Webmaster needs to put the information in that space or allow the server to point to the author's own space. We present DReSS, a system to enable authors to deposit (and update) documents on a WWW server, using standard WWW features only. Authors do not need login permission on the server machine, ftp upload access, or even electronic mail. As the documents live in the WWW server space there is no need for the server to be able to access documents outside its space. Thus, our system will work on even the most securely shielded servers (running in a chroot environment). DReSS consists of a set of CGI-scripts and two small auxiliary programs running on the client machine. It can be used with any (HTML-2.0-capable) WWW browser, and with any WWW server. DReSS does not use special features ..

CiteSeerX

Repository TU/e

Pure OAI Repository