18,437 research outputs found
A new problem in string searching
We describe a substring search problem that arises in group presentation
simplification processes. We suggest a two-level searching model: skip and
match levels. We give two timestamp algorithms which skip searching parts of
the text where there are no matches at all and prove their correctness. At the
match level, we consider Harrison signature, Karp-Rabin fingerprint, Bloom
filter and automata based matching algorithms and present experimental
performance figures.Comment: To appear in Proceedings Fifth Annual International Symposium on
Algorithms and Computation (ISAAC'94), Lecture Notes in Computer Scienc
Highly Scalable Algorithms for Robust String Barcoding
String barcoding is a recently introduced technique for genomic-based
identification of microorganisms. In this paper we describe the engineering of
highly scalable algorithms for robust string barcoding. Our methods enable
distinguisher selection based on whole genomic sequences of hundreds of
microorganisms of up to bacterial size on a well-equipped workstation, and can
be easily parallelized to further extend the applicability range to thousands
of bacterial size genomes. Experimental results on both randomly generated and
NCBI genomic data show that whole-genome based selection results in a number of
distinguishers nearly matching the information theoretic lower bounds for the
problem
- …