5 research outputs found
Covering Problems for Partial Words and for Indeterminate Strings
We consider the problem of computing a shortest solid cover of an
indeterminate string. An indeterminate string may contain non-solid symbols,
each of which specifies a subset of the alphabet that could be present at the
corresponding position. We also consider covering partial words, which are a
special case of indeterminate strings where each non-solid symbol is a don't
care symbol. We prove that indeterminate string covering problem and partial
word covering problem are NP-complete for binary alphabet and show that both
problems are fixed-parameter tractable with respect to , the number of
non-solid symbols. For the indeterminate string covering problem we obtain a
-time algorithm. For the partial word covering
problem we obtain a -time algorithm. We
prove that, unless the Exponential Time Hypothesis is false, no
-time solution exists for either problem, which shows
that our algorithm for this case is close to optimal. We also present an
algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared
at ISAAC 2014; 14 pages, 4 figure
Linear Algorithm for Conservative Degenerate Pattern Matching
A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a
sequence of such symbols is a degenerate string. A degenerate string is said to
be conservative if its number of non-solid symbols is upper-bounded by a fixed
positive constant k. We consider here the matching problem of conservative
degenerate strings and present the first linear-time algorithm that can find,
for given degenerate strings P* and T* of total length n containing k non-solid
symbols in total, the occurrences of P* in T* in O(nk) time
Rank and Select on Degenerate Strings
A 'degenerate string' is a sequence of subsets of some alphabet; it
represents any string obtainable by selecting one character from each set from
left to right. Recently, Alanko et al. generalized the rank-select problem to
degenerate strings, where given a character and position the goal is to
find either the th set containing or the number of occurrences of in
the first sets [SEA 2023]. The problem has applications to pangenomics; in
another work by Alanko et al. they use it as the basis for a compact
representation of 'de Bruijn Graphs' that supports fast membership queries.
In this paper we revisit the rank-select problem on degenerate strings,
introducing a new, natural parameter and reanalyzing existing reductions to
rank-select on regular strings. Plugging in standard data structures, the time
bounds for queries are improved exponentially while essentially matching, or
improving, the space bounds. Furthermore, we provide a lower bound on space
that shows that the reductions lead to succinct data structures in a wide range
of cases. Finally, we provide implementations; our most compact structure
matches the space of the most compact structure of Alanko et al. while
answering queries twice as fast. We also provide an implementation using modern
vector processing features; it uses less than one percent more space than the
most compact structure of Alanko et al. while supporting queries four to seven
times faster, and has competitive query time with all the remaining structures