25 research outputs found

    A compact representation of nondeterministic (suffix) automata for the bit-parallel approach

    Get PDF
    AbstractWe present a novel technique, suitable for bit-parallelism, for representing both the nondeterministic automaton and the nondeterministic suffix automaton of a given string in a more compact way. Our approach is based on a particular factorization of strings which on the average allows to pack in a machine word of w bits automata state configurations for strings of length greater than w. We adapted the Shift-And and BNDM algorithms using our encoding and compared them with the original algorithms. Experimental results show that the new variants are generally faster for long patterns

    Revisiting Multiple Pattern Matching

    Get PDF
    We consider the classical exact multiple string matching problem. The proposed solution is based on a combination of a few ideas: using q-grams instead of single characters, pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of various alternatives to achieve the possibly best combination of techniques. The main contribution of this paper are different alphabet mapping methods that allow to reduce memory requirements and use larger q-grams. The experimental results show that the presented algorithm is competitive in most practical cases. One of the tests shows also that tailoring our scheme to search over a byte-encoded text results in speedups in comparison to searching over a plain text

    Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

    Get PDF
    We aim to speed up approximate keyword matching with the use of a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors. They store information regarding symbol occurrences using individual bits, and they can be compared against each other with a constant number of bitwise operations. In this way, certain strings can be deduced to be at least within the distance k from each other (using Hamming or Levenshtein distance) without performing an explicit verification. We show experimentally that for a preprocessed collection of strings, fingerprints can provide substantial speedups for k = 1, namely over 2.5 times for the Hamming distance and over 30 times for the Levenshtein distance. Tests were conducted on synthetic and real-world English and URL data

    COMPARATIVE ANALYSIS OF BIT-PARALLEL STRING PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES

    Get PDF
    The inherent parallelism in a bit operation like AND/OR inside a computer word is known as bit parallelism. It plays a greater role in string pattern matching and has good application in the analysis of biological data. The use of recently developed bit parallel string matching algorithms approaches helps in improving the efficiency of the other string pattern matching algorithms. This paper discusses the working of some of these bit parallel string matching algorithms and their application on biological sequences. It also shows how bit-parallelism can be efficiently used to address various matching problems in Bioinformatics to analyze biological sequences such as Deoxyribonucleic acid (DNA), Ribonucleic acid (RNA), and Protein with examples. It can also serve as a greater tool for researchers when looking for the appropriate method to use on Biological sequences
    corecore