1,146 research outputs found

    An adaptive hybrid pattern-matching algorithm on indeterminate strings

    Get PDF
    We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS and its indeterminate successor. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our new algorithm combines two fast pattern-matching algorithms, ShiftAnd and BMS (the Sunday variant of the Boyer-Moore algorithm), and is highly adaptive to the nature of the text being processed. It avoids using the border array, therefore avoids some of the cases that are awkward for indeterminate strings. Although not always the fastest in individual test cases, our new algorithm is superior in overall performance to its two component algorithms — perhaps a general advantage of hybrid algorithms

    An adaptive hybrid pattern-matching algorithm on indeterminate strings

    Get PDF
    We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS [11] and its indeterminate successor [15]. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our new algorithm combines two fast pattern-matching algorithms, Shift-And and BMS (the Sunday variant of the Boyer-Moore algorithm), and is highly adaptive to the nature of the text being processed. It avoids using the border array, therefore avoids some of the cases that are awkward for indeterminate strings. Although not always the fastest in individual test cases, our new algorithm is superior in overall performance to its two component algorithms — perhaps a general advantage of hybrid algorithms

    Linear Algorithm for Conservative Degenerate Pattern Matching

    Full text link
    A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given degenerate strings P* and T* of total length n containing k non-solid symbols in total, the occurrences of P* in T* in O(nk) time

    Computing Covers Using Prefix Tables

    Get PDF
    An \emph{indeterminate string} x=x[1..n]x = x[1..n] on an alphabet Σ\Sigma is a sequence of nonempty subsets of Σ\Sigma; xx is said to be \emph{regular} if every subset is of size one. A proper substring uu of regular xx is said to be a \emph{cover} of xx iff for every i1..ni \in 1..n, an occurrence of uu in xx includes x[i]x[i]. The \emph{cover array} γ=γ[1..n]\gamma = \gamma[1..n] of xx is an integer array such that γ[i]\gamma[i] is the longest cover of x[1..i]x[1..i]. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular xx based on prior computation of the border array of xx. In this paper we first describe a linear-time algorithm to compute the cover array of regular string xx based on the prefix table of xx. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

    Covering Problems for Partial Words and for Indeterminate Strings

    Full text link
    We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to kk, the number of non-solid symbols. For the indeterminate string covering problem we obtain a 2O(klogk)+nkO(1)2^{O(k \log k)} + n k^{O(1)}-time algorithm. For the partial word covering problem we obtain a 2O(klogk)+nkO(1)2^{O(\sqrt{k}\log k)} + nk^{O(1)}-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no 2o(k)nO(1)2^{o(\sqrt{k})} n^{O(1)}-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figure

    Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform

    Get PDF
    International audienceA degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n, we present a new method based on the Burrows--Wheeler transform for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern-matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search complexity time is O(qm2). Experimental results show that our method performs well in practice

    Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

    Get PDF
    Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that nn longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word

    Indeterminate strings, prefix arrays & undirected graphs

    Get PDF
    An integer array y=y[1..n] is said to be feasible if and only if y[1]=n and, for every i∈2..n, i≤i+y[i]≤n+1. A string is said to be indeterminate if and only if at least one of its elements is a subset of cardinality greater than one of a given alphabet Σ; otherwise it is said to be regular. A feasible array y is said to be regular if and only if it is the prefix array of some regular string. We show using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, we use the model to provide a lower bound on the alphabet size. We show further that there is a 1–1 correspondence between labelled simple graphs and indeterminate strings, and we show how to determine the minimum alphabet size σ of an indeterminate string x based on its associated graph Gx. Thus, in this sense, indeterminate strings are a more natural object of combinatorial interest than the strings on elements of Σ that have traditionally been studied

    Enhanced covers of regular & indeterminate strings using prefix tables

    Get PDF
    A \itbf{cover} of a string x=x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper \cite{FIKPPST13} relaxes this definition --- an \itbf{enhanced cover} u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a {\it maximum} number of positions in x (not necessarily all) --- and proposes efficient algorithms for the computation of enhanced covers. These algorithms depend on the prior computation of the \itbf{border array} β[1..n], where β[i] is the length of the longest border of x[1..i], 1≤i≤n. In this paper, we first show how to compute enhanced covers using instead the \itbf{prefix table}: an array π[1..n] such that π[i] is the length of the longest substring of x beginning at position i that matches a prefix of x. Unlike the border array, the prefix table is robust: its properties hold also for \itbf{indeterminate strings} --- that is, strings defined on {\it subsets} of the alphabet Σ rather than individual elements of Σ. Thus, our algorithms, in addition to being faster in practice and more space-efficient than those of \cite{FIKPPST13}, allow us to easily extend the computation of enhanced covers to indeterminate strings. Both for regular and indeterminate strings, our algorithms execute in expected linear time. Along the way we establish an important theoretical result: that the expected maximum length of any border of any prefix of a regular string x is approximately 1.64 for binary alphabets, less for larger one
    corecore