AbstractVarious versions of the shortest common superstring problem play important roles in data compression and DNA sequencing. Only recently, the open problem of how to approximate a shortest superstring given a set of strings was solved in (Blum, 1991; Li, 1990). Blum (1991) shows that several greedy algorithms produce a superstring of length O(n), where n is the optimal length. However, a major problem remains open: can we still linearly approximate a superstring in polynomial time when the superstring is required to be consistent with some given negative strings, i.e., it must not contain any negative string? The best previous algorithm, Group-Merge given in (Jiang and Li, 1993; Li, 1990), produces a consistent superstring of length θ(n log n). The negative strings make the problem much more difficult and, as we will show, a greedy-style algorithm cannot achieve linear approximation for this problem.We present polynomial-time approximation algorithms that produce consistent superstrings of length O(n), for two important special cases: (a) when no negative strings contain positive strings as substrings; (b) when there are only a constant number of negative strings. The algorithms are obtained by making an essential use of the Hungarian algorithm, which can find an optimal cycle cover on weighted graphs.The other main objective of this paper is to analyze the performance of some greedy-style algorithms for this problem. Due to their time efficiency and simplicity, greedy algorithms are of practical importance. We introduce a new analysis showing that when no negative strings contain positive strings, a greedy algorithm achieves O(n43) and O(n) if the number of negative examples is further bounded by some constant
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.