2,013 research outputs found

    On finding minimal length superstrings

    Get PDF
    AbstractA superstring of a set of strings {s1,…, sn} is a string s containing each si, 1 ⩽ i ⩽ n, as a substring. The superstring problem is: Given a set S of strings and a positive integer K, does S have a superstring of length K? The superstring problem has applications to data storage; specifically, data compression. We consider the complexity of the superstring problem. NP-completeness results dealing with sets of strings over both finite and infinite alphabets are presented. Also, for a restricted version of the superstring problem, a linear time algorithm is given

    Combined super-/substring and super-/subsequence problems

    Get PDF
    Super-/substring problems and super-/subsequence problems are well-known problems in stringology that have applications in a variety of areas, such as manufacturing systems design and molecular biology. Here we investigate the complexity of a new type of such problem that forms a combination of a super-/substring and a super-/subsequence problem. Moreover we introduce different types of minimal superstring and maximal substring problems. In particular, we consider the following problems: given a set L of strings and a string S, (i) find a minimal superstring (or maximal substring) of L that is also a supersequence (or a subsequence) of S, (ii) find a minimal supersequence (or maximal subsequence) of L that is also a superstring (or a substring) of S. In addition some non-super-/non-substring and non-super-/non-subsequence variants are studied. We obtain several NP-hardness or even MAX SNP-hardness results and also identify types of "weak minimal" superstrings and "weak maximal" substrings for which (i) is polynomial-time solvable

    On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

    Full text link
    We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings SS is sought containing as a factor every string of SS or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio 44. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio 12\frac12, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implementation of our algorithm.Comment: Published in Information Processing Letter

    Collapsing Superstring Conjecture

    Get PDF
    In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits 2 11/23-approximation in polynomial time (Mucha, SODA\u2713). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS. We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph G. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph G is the same for all solutions, but that G can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm. While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3 (which until recently had been the only case where the Greedy Conjecture was proven). We also tested our conjectures on millions of instances of SCS. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation, and finds exact solutions for the special cases where we know polynomial time (not greedy) exact algorithms: (1) when the input strings form a spectrum of a string (2) when all input strings have length at most 2