11,336 research outputs found

    Duel and sweep algorithm for order-preserving pattern matching

    Full text link
    Given a text TT and a pattern PP over alphabet Σ\Sigma, the classic exact matching problem searches for all occurrences of pattern PP in text TT. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in O(n+mlogm)O(n + m\log m) time in general and O(n+m)O(n + m) time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in O(n2)O(n^2) time for duel stage and O(n2m)O(n^2 m) time for sweeping time with O(m3)O(m^3) preprocessing time.Comment: 13 pages, 5 figure

    Average-Case Optimal Approximate Circular String Matching

    Full text link
    Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time O(n(k + log m)/m). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using x and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach

    Fast Exact String Pattern-matching Algorithms Adapted to the Characteristics of the Medical Language

    Get PDF
    Objective: The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided. Design: The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given. Measurements: Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts. Results: The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. Conclusion: The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorith

    Implementation and comparison of Berry-Ravindran and Zhu- Takaoka exact string matching algorithms in Indonesian-Batak Toba dictionary

    Get PDF
    Indonesia has a variety of local languages, which is the Batak Toba language. This time, there are still some Batak Toba people who do not know speak Batak Toba language fluently. Nowadays, desktop based dictionary is one of reference that very efficiently used to learn a language and also to increase vocabulary. In making the dictionary application, string matching can be implemented for word-searching process. String matching have some algorithm, which is Berry – Ravindran algorithm and Zhu-Takaoka algorithm and will be implemented on the dictionary application. Zhu-Takaoka algorithm and Berry – Ravindran algorithm have two phases, which are the preprocessing phase and the searching phase. Preprocessing phase is a process to make the shifting values according to in pattern that input by user. To know the shifting value with Zhu-Takaoka algorithm, it’s need Zhu-Takaoka Bad Character (Ztbc) and Boyer-Moore Good Suffix (Bmgs). Then, Ztbc will be compared to Bmgs to get the maximum value of them that will be set as shifting value. While Berry-Ravindran algorithm, to know the shifting value is needed Berry-Ravindran Bad Character, which the two characters right of the text at the position m + 1 and m+ 2, is needed to determine the shifting value, where m is length of the pattern

    Pattern Matching in Multiple Streams

    Full text link
    We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur

    New Variants of Pattern Matching with Constants and Variables

    Full text link
    Given a text and a pattern over two types of symbols called constants and variables, the parameterized pattern matching problem is to find all occurrences of substrings of the text that the pattern matches by substituting a variable in the text for each variable in the pattern, where the substitution should be injective. The function matching problem is a variant of it that lifts the injection constraint. In this paper, we discuss variants of those problems, where one can substitute a constant or a variable for each variable of the pattern. We give two kinds of algorithms for both problems, a convolution-based method and an extended KMP-based method, and analyze their complexity.Comment: 15 pages, 2 figure
    corecore