2,369 research outputs found

    Designing optimal- and fast-on-average pattern matching algorithms

    Full text link
    Given a pattern ww and a text tt, the speed of a pattern matching algorithm over tt with regard to ww, is the ratio of the length of tt to the number of text accesses performed to search ww into tt. We first propose a general method for computing the limit of the expected speed of pattern matching algorithms, with regard to ww, over iid texts. Next, we show how to determine the greatest speed which can be achieved among a large class of algorithms, altogether with an algorithm running this speed. Since the complexity of this determination make it impossible to deal with patterns of length greater than 4, we propose a polynomial heuristic. Finally, our approaches are compared with 9 pre-existing pattern matching algorithms from both a theoretical and a practical point of view, i.e. both in terms of limit expected speed on iid texts, and in terms of observed average speed on real data. In all cases, the pre-existing algorithms are outperformed

    Pattern Matching Algorithms

    Get PDF
    Import 23/08/2017Cílem této bakalářské práce je implementace knihovny pro vyhledávání v textech. Knihovna bude umožňovat vyhledávání uživatelem určeného vzoru s určitým počtem chyb v textu založené na deterministických a nedeterministických konečných automatech. Pro přibližné porovnávání vzorů bude uživateli umožněn výběr mezi Hammingovou a Levenshteinovou vzdáleností. V první části se práce zabývá rozborem teorie týkající se využití konečných automatů pro vyhledávání v textu pomocí přibližného porovnávání vzorů. Druhá část se zabývá implementací knihoven. Třetí část se zabývá experimenty s naimplementovanými knihovnami. Závěr shrnuje výhody a nevýhody tohoto přístupu k vyhledávání v textech.The aim of this bachelor thesis is implementation of library for approximate pattern matching. This library will allow seeking of user specified pattern with specified number of maximum mistakes in text based on deterministic and nondeterministic finite automata. User will be able to choose between Hamming distance and Levenshtein distance. The first part describes use of finite automata for approximate pattern matching. The second part describes implementation of libraries. The third part focuses on experiments with implemented libraries. The conclusion of this thesis summarizes advantages and disadvantages of this approach to approximate pattern matching.460 - Katedra informatikyvýborn

    String pattern matching algorithms: An empirical analysis

    Get PDF

    A taxonomy of keyword pattern matching algorithms

    Get PDF

    Analysis of two-dimensional approximate pattern matching algorithms

    Get PDF
    AbstractWe present a new and more rigorous analysis of the two algorithms for two-dimensional approximate pattern matching due to Kärkkäinen and Ukkonen. We also present modifications of these algorithms that use less space while keeping the same expected time

    A taxonomy of sublinear multiple keyword pattern matching algorithms

    Get PDF
    AbstractThis article presents a taxonomy of sublinear keyword pattern matching algorithms related to the Boyer-Moore algorithm [3] and the Commentz-Walter algorithm [5, 6]. The taxonomy includes, amongst others, the multiple keyword generalization of the single keyword Boyer-Moore algorithm and an algorithm by Fan and Su [9, 10]. The corresponding precomputation algorithms are presented as well. The taxonomy is based on the idea of ordering algorithms according to their essential problem and algorithm details, and deriving all algorithms from a common starting point by successively adding these details in a correctness preserving way. This way of presentation not only provides a complete correctness argument of each algorithm, but also makes very clear what algorithms have in common (the details of their nearest common ancestor) and where they differ (the details added after their nearest common ancestor). Introduction of the notion of safe shift distances proves to be essential in the derivation and classification of the algorithms. Moreover, the article provides a common derivation for and a uniform presentation of the precomputation algorithms, not yet found in the literature
    corecore