1,146 research outputs found
An adaptive hybrid pattern-matching algorithm on indeterminate strings
We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS and its indeterminate successor. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our new algorithm combines two fast pattern-matching algorithms, ShiftAnd and BMS (the Sunday variant of the Boyer-Moore algorithm), and is highly adaptive to the nature of the text being processed. It avoids using the border array, therefore avoids some of the cases that are awkward for indeterminate strings. Although not always the fastest in individual test cases, our new algorithm is superior in overall performance to its two component algorithms — perhaps a general advantage of hybrid algorithms
An adaptive hybrid pattern-matching algorithm on indeterminate strings
We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS [11] and its indeterminate successor [15]. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our new algorithm combines two fast pattern-matching algorithms, Shift-And and BMS (the Sunday variant of the Boyer-Moore algorithm), and is highly adaptive to the nature of the text being processed. It avoids using the border array, therefore avoids some of the cases that are awkward for indeterminate strings. Although not always the fastest in individual test cases, our new algorithm is superior in overall performance to its two component algorithms — perhaps a general advantage of hybrid algorithms
Linear Algorithm for Conservative Degenerate Pattern Matching
A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a
sequence of such symbols is a degenerate string. A degenerate string is said to
be conservative if its number of non-solid symbols is upper-bounded by a fixed
positive constant k. We consider here the matching problem of conservative
degenerate strings and present the first linear-time algorithm that can find,
for given degenerate strings P* and T* of total length n containing k non-solid
symbols in total, the occurrences of P* in T* in O(nk) time
Computing Covers Using Prefix Tables
An \emph{indeterminate string} on an alphabet is a
sequence of nonempty subsets of ; is said to be \emph{regular} if
every subset is of size one. A proper substring of regular is said to
be a \emph{cover} of iff for every , an occurrence of in
includes . The \emph{cover array} of is
an integer array such that is the longest cover of .
Fifteen years ago a complex, though nevertheless linear-time, algorithm was
proposed to compute the cover array of regular based on prior computation
of the border array of . In this paper we first describe a linear-time
algorithm to compute the cover array of regular string based on the prefix
table of . We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur
Covering Problems for Partial Words and for Indeterminate Strings
We consider the problem of computing a shortest solid cover of an
indeterminate string. An indeterminate string may contain non-solid symbols,
each of which specifies a subset of the alphabet that could be present at the
corresponding position. We also consider covering partial words, which are a
special case of indeterminate strings where each non-solid symbol is a don't
care symbol. We prove that indeterminate string covering problem and partial
word covering problem are NP-complete for binary alphabet and show that both
problems are fixed-parameter tractable with respect to , the number of
non-solid symbols. For the indeterminate string covering problem we obtain a
-time algorithm. For the partial word covering
problem we obtain a -time algorithm. We
prove that, unless the Exponential Time Hypothesis is false, no
-time solution exists for either problem, which shows
that our algorithm for this case is close to optimal. We also present an
algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared
at ISAAC 2014; 14 pages, 4 figure
Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform
International audienceA degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n, we present a new method based on the Burrows--Wheeler transform for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern-matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search complexity time is O(qm2). Experimental results show that our method performs well in practice
Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties
Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word
Indeterminate strings, prefix arrays & undirected graphs
An integer array y=y[1..n] is said to be feasible if and only if y[1]=n and, for every i∈2..n, i≤i+y[i]≤n+1. A string is said to be indeterminate if and only if at least one of its elements is a subset of cardinality greater than one of a given alphabet Σ; otherwise it is said to be regular. A feasible array y is said to be regular if and only if it is the prefix array of some regular string. We show using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, we use the model to provide a lower bound on the alphabet size. We show further that there is a 1–1 correspondence between labelled simple graphs and indeterminate strings, and we show how to determine the minimum alphabet size σ of an indeterminate string x based on its associated graph Gx. Thus, in this sense, indeterminate strings are a more natural object of combinatorial interest than the strings on elements of Σ that have traditionally been studied
Enhanced covers of regular & indeterminate strings using prefix tables
A \itbf{cover} of a string x=x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper \cite{FIKPPST13} relaxes this definition --- an \itbf{enhanced cover} u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a {\it maximum} number of positions in x (not necessarily all) --- and proposes efficient algorithms for the computation of enhanced covers. These algorithms depend on the prior computation of the \itbf{border array} β[1..n], where β[i] is the length of the longest border of x[1..i], 1≤i≤n. In this paper, we first show how to compute enhanced covers using instead the \itbf{prefix table}: an array π[1..n] such that π[i] is the length of the longest substring of x beginning at position i that matches a prefix of x. Unlike the border array, the prefix table is robust: its properties hold also for \itbf{indeterminate strings} --- that is, strings defined on {\it subsets} of the alphabet Σ rather than individual elements of Σ. Thus, our algorithms, in addition to being faster in practice and more space-efficient than those of \cite{FIKPPST13}, allow us to easily extend the computation of enhanced covers to indeterminate strings. Both for regular and indeterminate strings, our algorithms execute in expected linear time. Along the way we establish an important theoretical result: that the expected maximum length of any border of any prefix of a regular string x is approximately 1.64 for binary alphabets, less for larger one
- …