Search CORE

25 research outputs found

Approximate string matching with reduced alphabet

Author: B. Ďurian
E. Ukkonen
E. Ukkonen
E. Ukkonen
E. Ukkonen
E. Ukkonen
J. Kärkkäinen
J. Kärkkäinen
J. Tarhio
J. Tarhio
K. Fredriksson
K. Fredriksson
K. Fredriksson
L. Salmela
M. Fontaine
M.R. Garey
P. Jokinen
P. Jokinen
R. Baeza-Yates
R. Muth
R. Zhu
R.M. Karp
R.N. Horspool
R.S. Boyer
T. Berry
T. Lecroq
V. Mäkinen
V.L. Arlazarov
W.J. Masek
Z. Liu
Publication venue: Heidelberg, Berlin, Springer Verlag,
Publication date: 01/01/2010
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

A compact representation of nondeterministic (suffix) automata for the bit-parallel approach

Author: Cantone Domenico
Faro Simone
Giaquinta Emanuele
Publication venue: Elsevier Inc.
Publication date: 30/04/2012
Field of study

AbstractWe present a novel technique, suitable for bit-parallelism, for representing both the nondeterministic automaton and the nondeterministic suffix automaton of a given string in a more compact way. Our approach is based on a particular factorization of strings which on the average allows to pack in a machine word of w bits automata state configurations for strings of length greater than w. We adapted the Shift-And and BNDM algorithms using our encoding and compared them with the original algorithms. Experimental results show that the new variants are generally faster for long patterns

Elsevier - Publisher Connector

Revisiting Multiple Pattern Matching

Author: Fredriksson Kimmo
Grabowski Szymon
Susik Robert
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/12/2019
Field of study

We consider the classical exact multiple string matching problem. The proposed solution is based on a combination of a few ideas: using q-grams instead of single characters, pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of various alternatives to achieve the possibly best combination of techniques. The main contribution of this paper are different alphabet mapping methods that allow to reduce memory requirements and use larger q-grams. The experimental results show that the presented algorithm is competitive in most practical cases. One of the tests shows also that tailoring our scheme to search over a byte-encoded text results in speedups in comparison to searching over a plain text

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 31/05/2019
Field of study

We aim to speed up approximate keyword matching with the use of a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors. They store information regarding symbol occurrences using individual bits, and they can be compared against each other with a constant number of bitwise operations. In this way, certain strings can be deduced to be at least within the distance k from each other (using Hamming or Levenshtein distance) without performing an explicit verification. We show experimentally that for a preprocessed collection of strings, fingerprints can provide substantial speedups for k = 1, namely over 2.5 times for the Hamming distance and over 30 times for the Levenshtein distance. Tests were conducted on synthetic and real-world English and URL data

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Parallel Processing of Multiple Pattern Matching Algorithms for Biological Sequences: Methods and Performance Results

Author: Charalampos S. Kouzinopoulos
Konstantinos G. Margaritis
Panagiotis D. Michailidis
Publication venue: 'IntechOpen'
Publication date: 12/09/2011
Field of study

IntechOpen

Crossref

COMPARATIVE ANALYSIS OF BIT-PARALLEL STRING PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES

Author: Mathias Fonkam
Muhammad Yusuf Muhammad
Rao Narasimha Vajjhala
Salu George Thandekatu
Sandip Rakshit
Publication venue: Regional Association for Security and crisis management, Belgrade, Serbia
Publication date: 01/04/2023
Field of study

The inherent parallelism in a bit operation like AND/OR inside a computer word is known as bit parallelism. It plays a greater role in string pattern matching and has good application in the analysis of biological data. The use of recently developed bit parallel string matching algorithms approaches helps in improving the efficiency of the other string pattern matching algorithms. This paper discusses the working of some of these bit parallel string matching algorithms and their application on biological sequences. It also shows how bit-parallelism can be efficiently used to address various matching problems in Bioinformatics to analyze biological sequences such as Deoxyribonucleic acid (DNA), Ribonucleic acid (RNA), and Protein with examples. It can also serve as a greater tool for researchers when looking for the appropriate method to use on Biological sequences

Directory of Open Access Journals