Search CORE

18 research outputs found

Approximation algorithms for the shortest common superstring problem

Author: Turner Jonathan S.
Publication venue: Published by Elsevier Inc.
Publication date: 31/10/1989
Field of study

AbstractThe object of the shortest common superstring problem (SCS) is to find the shortest possible string that contains every string in a given set as substrings. As the problem is NP-complete, approximation algorithms are of interest. The value of an aproximate solution to SCS is normally taken to be its length, and we seek algorithms that make the length as small as possible. A different measure is given by the sum of the overlaps between consecutive strings in a candidate solution. When considering this measure, the object is to find solutions that make it as large as possible. These two measures offer different ways of viewing the problem. While the two viewpoints are equivalent with respect to optimal solutions, they differ with respect to approximate solutions. We describe several approximation algorithms that produce solutions that are always within a factor of two of optimum with respect to the overlap measure. We also describe an efficient implementation of one of these, using McCreight's compact suffix tree construction algorithm. The worstcase running time is O(m log n) for small alphabets, where m is the sum of the lengths of all the strings in the set and n is the number of strings. For large alphabets, the algorithm can be implemented in O(m log m) time by using Sleator and Tarjan's lexicographic splay tree data structure

Elsevier - Publisher Connector

Approximation Algorithms for the Shortest Common Superstring Problem

Author: Turner Jonathan S.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/1986
Field of study

Washington University St. Louis: Open Scholarship

Recommended from our members

Algorithms for constructing a consensus sequence

Author: Cull Paul
Holloway Jim
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

Biological and physical limitations require that DNA be sequenced in fragments. There are several approaches to obtain the appropriate sized fragments of DNA to sequence. The method of sequencing that we are interested in is loosely referred to as shotgun sequencing. Many copies of the genomic DNA to be sequenced are cleaved by one or more restriction endonucleases resulting in a multiset, S, of DNA fragments that are not ordered. DNA fragments are essentially selected at random from this multset and sequenced. A consensus sequence is constructed by joining together fragments which overlap. (One hopes that the consensus sequence is very close to the original sequence.) Since errors occur reading the sequences, the overlaps must be approximate, not exact. This process of reassembly is similar to the NP-complete shortest common superstring problem [GMS80]. To simplify the problem we make the following assumptions. • An integer k can be supplied that defines the minimum acceptable overlap between two sequences. • There is a unique alignment of the sequence fragments such that all suf- fix/prefix overlaps are of length k or greater. • All suffix/prefix overlaps are exact (log inexact) matches. We define the string consensus problem and give three algorithms to solve it. We then define the log inexact string consensus problem and give three algorithms to solve it. We believe that the log inexact string consensus problem is closer to the problem of constructing a consensus sequence from shotgun data that biochemists are trying to solve than the problems previous approximation algorithms for the shortest common superstring problem

ScholarsArchive@OSU

Practical lower and upper bounds for the Shortest Linear Superstring

Author: Cazaux B.
Juhel Samuel
Rivals Eric
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2018
Field of study

Peer reviewe

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

Hal-Diderot

Parallel and sequential approximation of shortest superstrings

Author: J-S. Turner
J. Gallant
J. Tarhio
M. R. Garey
V. Chvatal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Linear approximation of shortest superstrings

Author: Blum A.
Jiang T. (Tao)
Li M. (Ming)
Tromp J.T. (John)
Yannakakis M.
Publication venue: a.c.m
Publication date: 01/01/1994
Field of study

CWI's Institutional Repository

Computational Molecular Biology

Author: Lenhof H.
Mutzel P.
Vingron M.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1996
Field of study

Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

MPG.PuRe