14 research outputs found

    New algorithms for exact and approximate text matching

    Get PDF
    Praca przedstawia główne wyniki z tematyki algorytmów tekstowych otrzymane w Katedrze Informatyki Stosowanej w latach 2004-2009. Algorytmy te dotyczą wybranych rozmaitych problemów wyszukiwania dokładnego i przybliżonego, również w intensywnie w ostatnich latach badanym scenariuszu z wykorzystaniem kompresji.This work presents main results in the domain of text algorithms obtained in Computer Engineering Dept. in the years 2004-2009. The algorithms concern various exact and approximate string matching problems, also in the recently actively developed scenario involving compression

    DC: a highly efficient and flexible exact pattern-matching algorithm

    Get PDF
    ware of the need for faster and flexible searching algorithms in fields such as web searching or bioinformatics, we propose DC - a high-performance algorithm for exact pattern matching. Emphasizing the analysis of pattern peculiarities in the pre-processing phase, the algorithm en- compasses a novel search logic based on the examination of multiple alignments within a larger window, selectively tested after a powerful heuristic called compatibility rule is verified. The new algorithm’s performance is, on average, above its best-rated competitors when testing different data types and using a complete suite of pattern extensions and compositions. The flexibility is remarkable and the efficiency is more relevant in quaternary or greater alphabets. Keywords: exact pattern-match, searching algorithms

    Order-Preserving Pattern Matching Indeterminate Strings

    Get PDF
    Given an indeterminate string pattern p and an indeterminate string text t, the problem of order-preserving pattern matching with character uncertainties (muOPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. When the text and pattern are determinate strings, we are in the presence of the well-studied exact order-preserving pattern matching (OPPM) problem with diverse applications on time series analysis. Despite its relevance, the exact OPPM problem suffers from two major drawbacks: 1) the inability to deal with indetermination in the text, thus preventing the analysis of noisy time series; and 2) the inability to deal with indetermination in the pattern, thus imposing the strict satisfaction of the orders among all pattern positions. In this paper, we provide the first polynomial algorithms to answer the muOPPM problem when: 1) indetermination is observed on the pattern or text; and 2) indetermination is observed on both the pattern and the text and given by uncertainties between pairs of characters. First, given two strings with the same length m and O(r) uncertain characters per string position, we show that the muOPPM problem can be solved in O(mr lg r) time when one string is indeterminate and r in N^+ and in O(m^2) time when both strings are indeterminate and r=2. Second, given an indeterminate text string of length n, we show that muOPPM can be efficiently solved in polynomial time and linear space

    Parallel String Matching

    Get PDF
    We explore the benefits of parallelizing 7 state-of-the-art string matching algorithms. Using SIMD and multi-threading techniques we achieve a significant performance improvement of up to 43.3x over reference implementations and a speedup of up to 16.7x over the string matching program grep. We evaluate our implementations on the smart-corpora and the full human genome data set. We show scalability over number of threads and impact of pattern length

    Enhancing computer-aided plagiarism detection

    Get PDF
    corecore