Genomic Scaffold Filling Revisited

Fan, Chenglin; Jiang, Haitao; Yang, Boting; Zhong, Farong; Zhu, Binhai; Zhu, Daming

research

Genomic Scaffold Filling Revisited

Authors: Chenglin Fan
Haitao Jiang
Boting Yang
Farong Zhong
Binhai Zhu
Daming Zhu
Publication date: 1 January 2016
Publisher: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Doi

Abstract

The genomic scaffold filling problem has attracted a lot of attention recently. The problem is on filling an incomplete sequence (scaffold) I into I\u27, with respect to a complete reference genome G, such that the number of adjacencies between G and I\u27 is maximized. The problem is NP-complete and APX-hard, and admits a 1.2-approximation. However, the sequence input I is not quite practical and does not fit most of the real datasets (where a scaffold is more often given as a list of contigs). In this paper, we revisit the genomic scaffold filling problem by considering this important case when, (1) a scaffold S is given, the missing genes X = c(G) - c(S) can only be inserted in between the contigs, and the objective is to maximize the number of adjacencies between G and the filled S\u27 and (2) a scaffold S is given, a subset of the missing genes X\u27 subset X = c(G) - c(S) can only be inserted in between the contigs, and the objective is still to maximize the number of adjacencies between G and the filled S\u27\u27. For problem (1), we present a simple NP-completeness proof, we then present a factor-2 greedy approximation algorithm, and finally we show that the problem is FPT when each gene appears at most d times in G. For problem (2), we prove that the problem is W[1]-hard and then we present a factor-2 FPT-approximation for the case when each gene appears at most d times in G

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Dagstuhl Research Online Publication Server

oai:drops-oai.dagstuhl.de:6079

Last time updated on 17/11/2016