36,929 research outputs found

    Optimal-Hash Exact String Matching Algorithms

    Full text link
    String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing qq-grams. The improvement consists of considering minimal values qq such that each qq-grams of the pattern has a unique hash value. The new algorithms are fastest than algorithm of the HASH family for short patterns on large size alphabets.Comment: 14 page

    A new family and structure for Commentz-Walter-style multiple-keyword pattern matching algorithms

    Get PDF
    In this paper, I present a new family of Commentz-Walter-style multiple-keyword string pattern matching algorithms. The algorithms share a common algorithmic skeleton, which is significantly optimized when compared to the original Commentz- Walter skeleton and subsequently derived improvements. The new skeleton is derived via correctness-preserving stepwise algorithmic improvements, in the Eindhoven style of programming

    Generalised Pattern Matching Revisited

    Get PDF
    In the problem of Generalised Pattern Matching (GPM)\texttt{Generalised Pattern Matching}\ (\texttt{GPM}) [STOC'94, Muthukrishnan and Palem], we are given a text TT of length nn over an alphabet ΣT\Sigma_T, a pattern PP of length mm over an alphabet ΣP\Sigma_P, and a matching relationship ⊆ΣT×ΣP\subseteq \Sigma_T \times \Sigma_P, and must return all substrings of TT that match PP (reporting) or the number of mismatches between each substring of TT of length mm and PP (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * D \mathcal{D}\, being the maximum number of characters that match a fixed character, * S \mathcal{S}\, being the number of pairs of matching characters, * I \mathcal{I}\, being the total number of disjoint intervals of characters that match the mm characters of the pattern PP. At the heart of our new deterministic upper bounds for D \mathcal{D}\, and S \mathcal{S}\, lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for GPM\texttt{GPM}. We start by showing that any deterministic or Monte Carlo algorithm for GPM\texttt{GPM} must use Ω(S)\Omega(\mathcal{S}) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed
    • …
    corecore