359 research outputs found

    The Boyer-Moore-Galil String Searching Strategies Revisited

    Get PDF

    On the Average-Case Running Time of the Boyer-Moore Algorithm

    Get PDF
    The Boyer-Moore algorithm (BM) is a fast, compact algorithm for finding all occurrences of a pattern string in a text string. Previous papers have addressed the worst-case running time of BM, which occurs rarely in practice. In this paper, we derive an approximation to Φ (BM) the average number of character probes made by BM. Let M = pattern length, N = text string length, α = the alphabet size, q = 1 /α and q= I — q. By modeling BM as a probabilistic finite automaton, we show that Φ(BM) h when M \u3c α and that Φ(BM ) N q(l + g V ) when M \u3e α. An immediate consequence is that Φ(BM) is O(N/ log α M) as M -\u3e \infty The above formulas match well with measured data

    Searching for Fixed-Length Patterns

    Get PDF
    We present an algorithm, RQ for finding all occurrences of a fixed-length pattern, Pi,J?2\u3e * * * »Pp , in a text string, where each p,- can match an arbitrary set of characters. Our algorithm is optimal in that it examines the minimum average number of text characters, which is not necessarily the same as being optimal in running time. This paper answers the question of optimal string searching put forth in [KMP77]. Let a = the alphabet size, P= the length of the string matched by the pattern, T= the length of the text, W= the word size in bits of the underlying machine, and (i?Q)=theaveragenumberoftextcharactersexaminedRQvWederiveanasymptoticapproximationfor(i?Q) = the average number of text characters examined RQvWe derive an asymptotic approximation for (RQ) when P\u3c a. We also show that &(RQ) \u3c (4 Ioga P/3)(T/P), when P \u3e a. In the worst case, RQ examines T characters. Our algorithm requires space 0(||II|| |P/W|). In addition, our method of analysis is applicable to other algorithms modeled by a finite automaton. We present an efficient implementation of our algorithm when P \u3c W. In practice, compared to the Boyer-Moore algorithm, RQ requires slightly more space, accepts a more general range of patterns, and runs in comparable time

    Playing with patterns, searching for strings

    Get PDF

    On the Comparison Complexity of the String Prefix-Matching Problem

    Get PDF
    In this paper we study the exact comparison complexity of the stringprefix-matching problem in the deterministic sequential comparison modelwith equality tests. We derive almost tight lower and upper bounds onthe number of symbol comparisons required in the worst case by on-lineprefix-matching algorithms for any fixed pattern and variable text. Unlikeprevious results on the comparison complexity of string-matching andprefix-matching algorithms, our bounds are almost tight for any particular pattern.We also consider the special case where the pattern and the text are thesame string. This problem, which we call the string self-prefix problem, issimilar to the pattern preprocessing step of the Knuth-Morris-Pratt string-matchingalgorithm that is used in several comparison efficient string-matchingand prefix-matching algorithms, including in our new algorithm.We obtain roughly tight lower and upper bounds on the number of symbolcomparisons required in the worst case by on-line self-prefix algorithms.Our algorithms can be implemented in linear time and space in thestandard uniform-cost random-access-machine model

    The conjugacy problem in right-angled Artin groups and their subgroups

    No full text
    29 pages, 7 figuresInternational audienceWe prove that the conjugacy problem in right-angled Artin groups (RAAGs), as well as in a large and natural class of subgroups of RAAGs, can be solved in linear-time. This class of subgroups contains, for instance, all graph braid groups (i.e. fundamental groups of configuration spaces of points in graphs), many hyperbolic groups, and it coincides with the class of fundamental groups of ``special cube complexes'' studied independently by Haglund and Wise
    corecore