Search CORE

11 research outputs found

Duel and sweep algorithm for order-preserving pattern matching

Author: A Amir
D Gusfield
DE Knuth
J Kim
M Crochemore
M Kubica
MM Hasan
R Cole
RN Horspool
RS Boyer
S Cho
S Faro
T Chhabra
U Vishkin
U Vishkin
Publication venue
Publication date: 26/05/2017
Field of study

Given a text

T

and a pattern

P

over alphabet

\Sigma

, the classic exact matching problem searches for all occurrences of pattern

P

in text

T

. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in

O(n + m\log m)

time in general and

O(n + m)

time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in

O(n^2)

time for duel stage and

O(n^2 m)

time for sweeping time with

O(m^3)

preprocessing time.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Optimal Parallel Algorithms for Periods, Palindromes and Squares (Preliminary Version)

Author: Apostolico Alberto
Breslauer Dany
Galil Zvi
Publication venue: 'Purdue University (bepress)'
Publication date: 26/11/1991
Field of study

Purdue E-Pubs

Efficient String Matching on Coded Texts

Author: Breslauer Dany
Publication venue: 'Aarhus University Library'
Publication date: 14/12/1994
Field of study

The so called "four Russians technique'' is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n / lambda) memory cells in O(log(lambda) ) time using n / log(lambda) processors. This paper presents an efficient CRCW-PRAM string-matching algorithm for coded texts that takes O(log log(m/lambda)) time making only O(n / lambda ) operations, an improvement by a factor of lambda = O(log n) on the number of operations used in previous algorithms. Using this string-matching algorithm one can test if a string is square-free and find all palindromes in a string in O(log log n) time using n / log log n processors

Tidsskrift.dk (Det Kongelige Bibliotek)

Towards optimal packed string matching

Author: Aho
Aho
AMD
AMD
Apostolico
Arlazarov
Baeza-Yates
Belazzougui
Ben-Kiki
Ben-Nissan
Bille
Boyer
Breslauer
Breslauer
Breslauer
Breslauer
Breslauer
Brodnik
Cole
Cole
Commentz-Walter
Crochemore
Crochemore
Crochemore
Czumaj
Césari
Dany Breslauer
Daykin
Duval
Faro
Faro
Faro
Fich
Fine
Fischer
Fredriksson
Fredriksson
Furst
Galil
Galil
Goldberg
Gusfield
Gąsieniec
Iliopoulos
Intel
Intel
Intel
Knuth
Knuth
Leszek Ga̧sieniec
Lothaire
Muthukrishnan
Muthukrishnan
Muthukrishnan
Navarro
Oren Ben-Kiki
Oren Weimann
Philip Bille
Roberto Grossi
Rytter
Tarhio
Vishkin
Vishkin
Yao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

a r t i c l e i n f o a b s t r a c t Dedicated to Professor Gad M. Landau, on the occasion of his 60th birthday Keywords: String matching Word-RAM Packed strings In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. The main word-size string-matching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern pre-processing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work. (b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string-matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in O (1) time

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Online Research Database In Technology

パターン照合問題に対する高速なアルゴリズム

Author: Diptarama Hendrian
Publication venue
Publication date: 27/03/2018
Field of study

Tohoku University篠原歩課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Compressed and Practical Data Structures for Strings

Author: Christiansen Anders Roy
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

部分文字列一貫同値関係の下での文字列パターン照合問題のためのduel-and-sweepアルゴリズム

Author: Jargalsaikhan Davaajav
Publication venue
Publication date: 25/03/2022
Field of study

Tohoku University篠原歩課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Efficient Algorithms for a Mesh-Connected Computer with Additional Global Bandwidth

Author: An Yujie
Publication venue
Publication date
Field of study

This thesis shows that adding additional global bandwidths to a mesh-connected computer can greatly improve the performance. The goal of this project is to design algorithms for mesh-connected computers augmented with limited global bandwidth, so that we can further enhance our understanding of the parallel/serial nature of the problems on evolving parallel architectures. We do this by first solving several problems associated with fundamental data movement, then summarize ways to resolve different situations one may observe in data movement in parallel computing. This can help us to understand whether the problem is easily parallelizable on different parallel models. We give efficient algorithms to solve several fundamental problems, which include sorting, counting, fast Fourier transform, finding a minimum spanning tree, finding a convex hull, etc. We show that adding a small amount of global bandwidth makes a practical design that combines aspects of mesh and fully connected models to achieve the benefits of each. Most of the algorithms are optimal. For future work, we believe that algorithms with peak-power constrains can make our model well adapted to the recent architectures in high performance computing.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/150001/1/anyujie_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Algorithms for string matching with applications in molecular biology

Author: Holloway James Lee
Publication venue: 'Oregon State University'
Publication date
Field of study

As the volume of genetic sequence data increases due to improved sequencing techniques and increased interest, the computational tools available to analyze the data are becoming inadequate. This thesis seeks to improve a few of the computational methods available to access and analyze data in the genetic sequence databases. The first two results are parallel algorithms based on previously known sequential algorithms. The third result is a new approach, based on assumptions that we believe make sense in the biological context of the problem, to approximating an NP complete problem. The final result is a fundamentally new approach to approximate string matching using the divide and conquer paradigm instead of the dynamic programming approach that has been used almost exclusively in the past. Dynamic programming algorithms to measure the distance between sequences have been known since at least 1972. Recently there has been interest in developing parallel algorithms to measure the distance between two sequences. We have developed an optimal parallel algorithm to find the edit distance, a metric frequently used to measure distance, between two sequences. It is often interesting to find the substrings of length k that appear most frequently in a given string. We give a simple sequential algorithm to solve this problem and an efficient parallel version of the algorithm. The parallel algorithm uses an efficient novel parallel bucket sort. When sequencing a large segment of DNA, the original DNA sequence is reconstructed using the results of sequencing fragments, that may or may not contain errors, of many copies of the original DNA. New algorithms are given to solve the problem of reconstructing the original DNA sequence with and without errors introduced into the fragments. A program based on this algorithm is used to reconstruct the human beta globin region (HUMHBB) when given a set of 300 to 500 mers drawn randomly from the HUMHBB region. Approximate string matching is used in a biological context to model the steps of evolution. While such evolution may proceed base by base using the change, insert, or delete operators, there is also evidence that whole genes may be moved or inverted. We introduce a new problem, the string to string rearrangement problem, that allows movement and inversion of substrings. We give a divide and conquer algorithm for finding a rearrangement of one string within another

ScholarsArchive@OSU