5 research outputs found

    Covering Problems for Partial Words and for Indeterminate Strings

    Full text link
    We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to kk, the number of non-solid symbols. For the indeterminate string covering problem we obtain a 2O(klogk)+nkO(1)2^{O(k \log k)} + n k^{O(1)}-time algorithm. For the partial word covering problem we obtain a 2O(klogk)+nkO(1)2^{O(\sqrt{k}\log k)} + nk^{O(1)}-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no 2o(k)nO(1)2^{o(\sqrt{k})} n^{O(1)}-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figure

    Linear Algorithm for Conservative Degenerate Pattern Matching

    Full text link
    A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given degenerate strings P* and T* of total length n containing k non-solid symbols in total, the occurrences of P* in T* in O(nk) time

    Rank and Select on Degenerate Strings

    Full text link
    A 'degenerate string' is a sequence of subsets of some alphabet; it represents any string obtainable by selecting one character from each set from left to right. Recently, Alanko et al. generalized the rank-select problem to degenerate strings, where given a character cc and position ii the goal is to find either the iith set containing cc or the number of occurrences of cc in the first ii sets [SEA 2023]. The problem has applications to pangenomics; in another work by Alanko et al. they use it as the basis for a compact representation of 'de Bruijn Graphs' that supports fast membership queries. In this paper we revisit the rank-select problem on degenerate strings, introducing a new, natural parameter and reanalyzing existing reductions to rank-select on regular strings. Plugging in standard data structures, the time bounds for queries are improved exponentially while essentially matching, or improving, the space bounds. Furthermore, we provide a lower bound on space that shows that the reductions lead to succinct data structures in a wide range of cases. Finally, we provide implementations; our most compact structure matches the space of the most compact structure of Alanko et al. while answering queries twice as fast. We also provide an implementation using modern vector processing features; it uses less than one percent more space than the most compact structure of Alanko et al. while supporting queries four to seven times faster, and has competitive query time with all the remaining structures