Search CORE

27 research outputs found

パターン照合問題に対する高速なアルゴリズム

Author: Diptarama Hendrian
Publication venue
Publication date: 27/03/2018
Field of study

Tohoku University篠原歩課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

A taxonomy of keyword pattern matching algorithms

Author: Watson B.W.
Zwaan G.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1992
Field of study

Repository TU/e

Pure OAI Repository

Matching and Compression of Strings with Automata and Word Packing

Author: Skjoldjensen Frederik Rye
Publication venue: DTU Compute
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

DC: a highly efficient and flexible exact pattern-matching algorithm

Author: Carvalho Paulo
Deusdado Sérgio
Publication venue: Centro de Ciências e Tecnologias da Computação - Universidade do Minho
Publication date: 01/01/2009
Field of study

ware of the need for faster and flexible searching algorithms in fields such as web searching or bioinformatics, we propose DC - a high-performance algorithm for exact pattern matching. Emphasizing the analysis of pattern peculiarities in the pre-processing phase, the algorithm en- compasses a novel search logic based on the examination of multiple alignments within a larger window, selectively tested after a powerful heuristic called compatibility rule is verified. The new algorithm’s performance is, on average, above its best-rated competitors when testing different data types and using a complete suite of pattern extensions and compositions. The flexibility is remarkable and the efficiency is more relevant in quaternary or greater alphabets. Keywords: exact pattern-match, searching algorithms

Biblioteca Digital do IPB

Practical algorithms for biological sequence analysis:methods and applications

Author: Retha Ahmad
Publication venue
Publication date: 01/06/2019
Field of study

King's Research Portal

Space efficient algorithms for string processing

Author: Dhaliwal J
Publication venue: RMIT University
Publication date: 01/01/2013
Field of study

The suffix array (SA), which is an array containing the suffixes of a string sorted into lexicographical order, was introduced in the late eighties as a space efficient alternative to the suffix tree. It has since emerged as a useful data structure in string processing problems such as pattern matching, pattern discovery, and data compression. The SA is often coupled with the longest-common-prefix (LCP) array that contains the length of the longest common prefixes between consecutive suffixes in the SA. When enhanced with the LCP array, the SA can provide efficient solutions to the above applications including a problem called pattern mining. To date, all the mining algorithms lie at either extreme of the efficiency spectrum: they are either fast and use enormous amounts of space, or they are compact and orders of magnitude slower. We present a mining algorithm that achieves the best of both these extremes, having runtime comparable to the fastest published algorithms while using less space than the most space efficient. In all these applications, the construction of the SA --- also known as suffix sorting --- is one of the main computational bottlenecks. Most papers describing the SA assume the SA fits in RAM memory, limiting their applications. The fastest algorithms in this large memory suffix sorting category use powerful pointer copying heuristics to expedite suffix sorting. Several space efficient algorithms have emerged in the last five years, where the trend is to use as little RAM as possible. They do so by finding a clever way to trade runtime, or by using slow compressed data structures, or by using external memory (disk), or some combination of these techniques. In this thesis, we focus on improving the runtime of a space efficient algorithm due to Kärkkäinen by adapting the heuristics from large memory suffix sorting to a semi-external setting. Also, pointer copying has been heavily used to speed up the construction of the SA, but not the LCP array. We also discuss our attempts of combining the pointer copying heuristics to an efficient LCP construction algorithm due to Kärkkäinen, Manzini and Puglisi. The Burrows-Wheeler transform (BWT) was discovered independently of the SA, but it is now known that the two data structures are deeply linked. The BWT is central to practical compression tools such as szip and bzip2. Many papers have been published on constructing the BWT either in RAM or in external memory but few on inverting the BWT to obtain the original string --- in fact none in external memory. For larger datasets, the existing traditional approaches cannot be used to invert the BWT. In such cases, we have to use disk. We close the gap between theory and practice by examining the problem of inverting the BWT efficiently on disk. We provide a practical implementation of the only theoretical proposal for the problem by Ferragina, Gagie and Manzini. We also provide new, faster solutions to the problem based on simple scanning and compression techniques

RMIT Research Repository

Towards optimal packed string matching

Author: Aho
Aho
AMD
AMD
Apostolico
Arlazarov
Baeza-Yates
Belazzougui
Ben-Kiki
Ben-Nissan
Bille
Boyer
Breslauer
Breslauer
Breslauer
Breslauer
Breslauer
Brodnik
Cole
Cole
Commentz-Walter
Crochemore
Crochemore
Crochemore
Czumaj
Césari
Dany Breslauer
Daykin
Duval
Faro
Faro
Faro
Fich
Fine
Fischer
Fredriksson
Fredriksson
Furst
Galil
Galil
Goldberg
Gusfield
Gąsieniec
Iliopoulos
Intel
Intel
Intel
Knuth
Knuth
Leszek Ga̧sieniec
Lothaire
Muthukrishnan
Muthukrishnan
Muthukrishnan
Navarro
Oren Ben-Kiki
Oren Weimann
Philip Bille
Roberto Grossi
Rytter
Tarhio
Vishkin
Vishkin
Yao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

a r t i c l e i n f o a b s t r a c t Dedicated to Professor Gad M. Landau, on the occasion of his 60th birthday Keywords: String matching Word-RAM Packed strings In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. The main word-size string-matching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern pre-processing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work. (b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string-matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in O (1) time

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Online Research Database In Technology

Improved Periodicity Mining in Time Series Databases

Author: Uppalapati Nithin
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2015
Field of study

Time series data represents information about real world phenomena and periodicity mining explores the interesting periodic behavior that is inherent in the data. Periodicity mining has numerous applications such as in weather forecasting, stock market prediction and analysis, pattern recognition, etc. Recently, the suffix tree, a powerful data structure that efficiently solves many strings related problems has been used to gather information about repeated substrings in the text and then perform periodicity mining. However, periodicity mining deals with large amounts of data which makes it difficult to perform mining in main memory due to the space constraints of the suffix tree. Thus, we first propose the use of the Compressed Suffix Tree (CST) for space efficient periodicity mining in very large datasets. Given the time-space trade-off that comes with any practical usage of the CST, we provide a comprehensive empirical analysis on the practical usage of CSTs and traditional suffix trees for periodicity mining.;Noise is an inherent part of practical time series data, and it is important to mine periods in spite of the noise. This leads to the problem of approximate periodicity mining. Existing algorithms have dealt with the noise introduced between the occurrences of the periodic pattern, but not the noise introduced in the structure of the pattern itself. We present a taxonomy for approximate periodicity and then propose an algorithm that performs periodicity mining in the presence of noise introduced simultaneously in both the structure of the pattern and between the periodic occurrences of the pattern

The Research Repository @ WVU (West Virginia University)