7 research outputs found

    On the Average-Case Running Time of the Boyer-Moore Algorithm

    Get PDF
    The Boyer-Moore algorithm (BM) is a fast, compact algorithm for finding all occurrences of a pattern string in a text string. Previous papers have addressed the worst-case running time of BM, which occurs rarely in practice. In this paper, we derive an approximation to Φ (BM) the average number of character probes made by BM. Let M = pattern length, N = text string length, α = the alphabet size, q = 1 /α and q= I — q. By modeling BM as a probabilistic finite automaton, we show that Φ(BM) h when M \u3c α and that Φ(BM ) N q(l + g V ) when M \u3e α. An immediate consequence is that Φ(BM) is O(N/ log α M) as M -\u3e \infty The above formulas match well with measured data

    Searching for Fixed-Length Patterns

    Get PDF
    We present an algorithm, RQ for finding all occurrences of a fixed-length pattern, Pi,J?2\u3e * * * »Pp , in a text string, where each p,- can match an arbitrary set of characters. Our algorithm is optimal in that it examines the minimum average number of text characters, which is not necessarily the same as being optimal in running time. This paper answers the question of optimal string searching put forth in [KMP77]. Let a = the alphabet size, P= the length of the string matched by the pattern, T= the length of the text, W= the word size in bits of the underlying machine, and (i?Q)=theaveragenumberoftextcharactersexaminedRQvWederiveanasymptoticapproximationfor(i?Q) = the average number of text characters examined RQvWe derive an asymptotic approximation for (RQ) when P\u3c a. We also show that &(RQ) \u3c (4 Ioga P/3)(T/P), when P \u3e a. In the worst case, RQ examines T characters. Our algorithm requires space 0(||II|| |P/W|). In addition, our method of analysis is applicable to other algorithms modeled by a finite automaton. We present an efficient implementation of our algorithm when P \u3c W. In practice, compared to the Boyer-Moore algorithm, RQ requires slightly more space, accepts a more general range of patterns, and runs in comparable time

    Improved algorithms for string searching problems

    Get PDF
    We present improved practically efficient algorithms for several string searching problems, where we search for a short string called the pattern in a longer string called the text. We are mainly interested in the online problem, where the text is not preprocessed, but we also present a light indexing approach to speed up exact searching of a single pattern. The new algorithms can be applied e.g. to many problems in bioinformatics and other content scanning and filtering problems. In addition to exact string matching, we develop algorithms for several other variations of the string matching problem. We study algorithms for approximate string matching, where a limited number of errors is allowed in the occurrences of the pattern, and parameterized string matching, where a substring of the text matches the pattern if the characters of the substring can be renamed in such a way that the renamed substring matches the pattern exactly. We also consider searching multiple patterns simultaneously and searching weighted patterns, where the weight of a character at a given position reflects the probability of that character occurring at that position. Many of the new algorithms use the backward matching principle, where the characters of the text that are aligned with the pattern are read backward, i.e. from right to left. Another common characteristic of the new algorithms is the use of q-grams, i.e. q consecutive characters are handled as a single character. Many of the new algorithms are bit parallel, i.e. they pack several variables to a single computer word and update all these variables with a single instruction. We show that the q-gram backward string matching algorithms that solve the exact, approximate, or multiple string matching problems are optimal on average. We also show that the q-gram backward string matching algorithm for the parameterized string matching problem is sublinear on average for a class of moderately repetitive patterns. All the presented algorithms are also shown to be fast in practice when compared to earlier algorithms. We also propose an alphabet sampling technique to speed up exact string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. String matching is then performed on this reduced subsequence and the found matches are verified in the original text. We show how to choose the sampled alphabet optimally and show that the technique speeds up string matching especially for moderate to long patterns

    Algorithms for Order-Preserving Matching

    Get PDF
    String matching is a widely studied problem in Computer Science. There have been many recent developments in this field. One fascinating problem considered lately is the order-preserving matching (OPM) problem. The task is to find all the substrings in the text which have the same length and relative order as the pattern, where the relative order is the numerical order of the numbers in a string. The problem finds its applications in the areas involving time series or series of numbers. More specifically, it is useful for those who are interested in the relative order of the pattern and not in the pattern itself. For example, it can be used by analysts in a stock market to study movements of prices.  In addition to the OPM problem, we also studied its approximate variation. In approximate order-preserving matching, we search for those substrings in the text which have relative order similar to the pattern, i.e., relative order of the pattern matches with at most k mismatches. With respect to applications of order-preserving matching, approximate search is more meaningful than exact search. We developed various advanced solutions for the problem and its variant. Special emphasis was laid on the practical efficiency of the solutions. Particularly, we introduced a simple solution for the OPM problem using filtration. We proved experimentally that our method was effective and faster than the previous solutions for the problem. In addition, we combined the Single Instruction Multiple Data (SIMD) instruction set architecture with filtration to develop competent solutions which were faster than our previous solution. Moreover, we proposed another efficient solution without filtration using the SIMD architecture. We also presented an offline solution based on the FM-index scheme. Furthermore, we proposed practical solutions for the approximate order-preserving matching problem and one of the solutions was the first sublinear solution on average for the problem

    On the Expected Sublinearity of the Boyer-Moore Algorithm

    No full text
    . This paper analyzes the expected performance of a simplified version BM 0 of the Boyer--Moore string matching algorithm. A probabilistic automaton A is set up which models the expected behavior of BM 0 under the assumption that both text and pattern are generated by a source which emits independent and uncorrelated symbols with an arbitrary distribution of probabilities. Formal developments lead then to the conclusion that A takes expected sublinear time in a variety of situations. The sublinear behavior can be quantitatively predicted by simple formulae involving the pattern length m and the alphabet's probabilistic properties. Finally, empirical evidence is provided which is in satisfactory accordance with the theory. Keywords : String searching, Pattern matching, Average case analysis of algorithms. 1 1 The problem Let A be a finite alphabet, jAj =: n, and suppose strings T = t 1 :::t N ; t i 2A; 1iN (the "text 00 ) S = s 1 :::s m ; s i 2A; 1imN (the "pattern 00 ) are..

    Resonance in swirling wakes and sloshing waves:non-normal and sublinear effects

    Get PDF
    Similarly to mechanical structures, stable flows can exhibit resonance when perturbed by an impulsive or harmonic forcing. Swirling wakes and sloshing waves belong to this kind of flows and manifest large energy response when excited close to their natural frequencies. Although these frequencies can be predicted by linear modal analysis, the full flow dynamics differs from the modal one because entailed by the mutual cooperation of the natural modes (non-normal effects) and dependent on the oscillation amplitude (nonlinear effects). In this thesis, the response of swirling wakes subjected to a harmonic forcing is studied numerically and theoretically. Direct numerical simulations show that a large variety of helical modes can be excited and amplified in trailing vortices when a harmonic inlet or volume forcing is imposed, with the appearance of higher wavenumber modes at higher frequency. The mode-selection mechanism is shown to be directly connected to the local stability properties of the flow, and is simultaneously investigated by a WKB approximation, in the framework of weakly non-parallel flows, and by the global resolvent approach. This analysis is then extended to the case of turbulent swirling flows to investigate the physical origin of the meandering oscillations of the hub vortex, that is observed in wind turbine wakes experiments. We show as this low frequency spectral component is the result of a convectively unstable single-helix structure that oscillates at a frequency equal to one third the rotational frequency of the wind turbine rotor. Consequently, an adjoint-based technique for the passive control of these helical instabilities is proposed. We then turn our attention towards the transient decay of sloshing waves affected by a viscous friction at the containerâs wall, that exhibits a sublinear dependence in the interface velocity, i.e. a power law with an exponent smaller than one. This capillary effect is exacerbated in our experiment by placing a thin layer of foam on the liquid phase that act as a collection of air-liquid interfaces. In contrast to classical theory, we uncover the existence of a finite-time singularity in our system yielding the arrest of the sloshing oscillations in a finite time and we propose a minimal theoretical framework to capture this effect. Using first principles, we then study the effect of contact angle hysteresis on sloshing waves. We show asymptotically that, in contrast to viscous damping where the wave motion decays exponentially, the contact angle hysteresis acts as Coulomb solid friction yielding the damping rate induced by the motion of the liquid meniscus to increase at small amplitude, consistently with the experimental observation

    Programme radiation protection. Progress report. EUR 7169

    Get PDF
    corecore