3 research outputs found
On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals
We study a variation of the classical Shortest Common Superstring (SCS)
problem in which a shortest superstring of a finite set of strings is
sought containing as a factor every string of or its reversal. We call this
problem Shortest Common Superstring with Reversals (SCS-R). This problem has
been introduced by Jiang et al., who designed a greedy-like algorithm with
length approximation ratio . In this paper, we show that a natural
adaptation of the classical greedy algorithm for SCS has (optimal) compression
ratio , i.e., the sum of the overlaps in the output string is at least
half the sum of the overlaps in an optimal solution. We also provide a
linear-time implementation of our algorithm.Comment: Published in Information Processing Letter
A note on the shortest common superstring of NGS reads
The Shortest Superstring Problem (SSP) consists, for a set of strings S =
{s_1,...,s_n}, to find a minimum length string that contains all s_i, 1 <= i <=
k, as substrings. This problem is proved to be NP-Complete and APX-hard.
Guaranteed approximation algorithms have been proposed, the current best ratio
being 2+11/23, which has been achieved following a long and difficult quest.
However, SSP is highly used in practice on next generation sequencing (NGS)
data, which plays an increasingly important role in sequencing. In this note,
we show that the SSP approximation ratio can be improved on NGS reads by
assuming specific characteristics of NGS data that are experimentally verified
on a very large sampling set
On improving the approximation ratio of the r-shortest common superstring problem
The Shortest Common Superstring problem (SCS) consists, for a set of strings
S = {s_1,...,s_n}, in finding a minimum length string that contains all s_i,
1<= i <= n, as substrings. While a 2+11/30 approximation ratio algorithm has
recently been published, the general objective is now to break the conceptual
lower bound barrier of 2. This paper is a step ahead in this direction. Here we
focus on a particular instance of the SCS problem, meaning the r-SCS problem,
which requires all input strings to be of the same length, r. Golonev et al.
proved an approximation ratio which is better than the general one for r<= 6.
Here we extend their approach and improve their approximation ratio, which is
now better than the general one for r<= 7, and less than or equal to 2 up to r
= 6