11 research outputs found

    On the Complexity of the Single Individual SNP Haplotyping Problem

    Full text link
    We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for different restrictions on the input data. Specifically, we look at the gapless case, where every row of the input corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap case and still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard in the 1-gap case (and thus also in the general case), but is polynomial time solvable in the gapless case.Comment: 26 pages. Related to the WABI2005 paper, "On the Complexity of Several Haplotyping Problems", but with more/different results. This papers has just been submitted to the IEEE/ACM Transactions on Computational Biology and Bioinformatics and we are awaiting a decision on acceptance. It differs from the mid-August version of this paper because here we prove that 1-gap LHR is APX-hard. (In the earlier version of the paper we could prove only that it was NP-hard.

    Theory and Algorithms for the Haplotype Assembly Problem

    Full text link

    Linear Time Parameterized Algorithms via Skew-Symmetric Multicuts

    Full text link
    A skew-symmetric graph (D=(V,A),σ)(D=(V,A),\sigma) is a directed graph DD with an involution σ\sigma on the set of vertices and arcs. In this paper, we introduce a separation problem, dd-Skew-Symmetric Multicut, where we are given a skew-symmetric graph DD, a family of T\cal T of dd-sized subsets of vertices and an integer kk. The objective is to decide if there is a set XAX\subseteq A of kk arcs such that every set JJ in the family has a vertex vv such that vv and σ(v)\sigma(v) are in different connected components of D=(V,A(Xσ(X))D'=(V,A\setminus (X\cup \sigma(X)). In this paper, we give an algorithm for this problem which runs in time O((4d)k(m+n+))O((4d)^{k}(m+n+\ell)), where mm is the number of arcs in the graph, nn the number of vertices and \ell the length of the family given in the input. Using our algorithm, we show that Almost 2-SAT has an algorithm with running time O(4kk4)O(4^kk^4\ell) and we obtain algorithms for {\sc Odd Cycle Transversal} and {\sc Edge Bipartization} which run in time O(4kk4(m+n))O(4^kk^4(m+n)) and O(4kk5(m+n))O(4^kk^5(m+n)) respectively. This resolves an open problem posed by Reed, Smith and Vetta [Operations Research Letters, 2003] and improves upon the earlier almost linear time algorithm of Kawarabayashi and Reed [SODA, 2010]. We also show that Deletion q-Horn Backdoor Set Detection is a special case of 3-Skew-Symmetric Multicut, giving us an algorithm for Deletion q-Horn Backdoor Set Detection which runs in time O(12kk5)O(12^kk^5\ell). This gives the first fixed-parameter tractable algorithm for this problem answering a question posed in a paper by a superset of the authors [STACS, 2013]. Using this result, we get an algorithm for Satisfiability which runs in time O(12kk5)O(12^kk^5\ell) where kk is the size of the smallest q-Horn deletion backdoor set, with \ell being the length of the input formula

    A new workflow of fetal DNA prediction from cell-free DNA in maternal plasma

    Get PDF
    Prediction of fetal DNA allows diagnosing known/passed mutations before child’s birth. Public health significance of such early testing is that it can reassure parents who have negative results and offers timely information for those with abnormal results. My dissertation work presents a new approach of reconstructing fetal DNA from maternal plasma. The method works because plasma from pregnant women, which contains “cell-free DNA”, has been noted to contain fetal DNA as well as maternal DNA. I developed and tested a workflow that implements my suggested approach. The workflow was broken into several parts, each fully documented in this dissertation. Each step we have taken was supported with explanation of the logic driving the step. The approach works through the examination of sequencing data sets generated by short-read sequencing (also known as next-generation sequencing), by calling variation (single nucleotide polymorphisms, or SNPs) within those samples vis-à-vis a reference sequence. I developed and introduced a series of quality control criteria applied to SNPs to improve overall prediction. A novel single individual haplotyping method was developed and applied to haplotype the parental samples. The obtained parental haplotypes were incorporated into the workflow and along with parental genotypes were used to find transmitted haplotypes in the maternal plasma. The predicted haplotypes were then aligned to each other to obtain phased SNPs. For evaluation, I compared fetal SNPs predicted by my method against control fetal SNPs (from sequencing of fetal DNA). Overall prediction power is discussed. Possible ways of improvements that should affect the overall prediction are also described

    Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem

    No full text
    Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation, of foremost importance for a variety of applications including medical diagnostic, phylogenies and drug design. The complete SNPs sequence information from each of the two copies of a given chromosome in a diploid genome is called a haplotype. The Haplotyping Problem for a single individual is as follows: Given a set of fragments from one individual\u2019s DNA, find a maximally consistent pair of SNPs haplotypes (one per chromosome copy) by removing data \u201cerrors\u201d related to sequencing errors, repeats, and paralogous recruitment. Two versions of the problem, i.e. the Minimum Fragment Removal (MFR) and the Minimum SNP Removal (MSR), are considered. The Haplotyping Problem was introduced in [8], where it was proved that both MSR and MFR are polynomially solvable when each fragment covers a set of consecutive SNPs (i.e., it is a gapless fragment), and NP-hard in general. The original algorithms of [8] are of theoretical interest, but by no means practical. In fact, one relies on finding the maximum stable set in a perfect graph, and the other is a reduction to a network flow problem. Furthermore, the reduction does not work when there are fragments completely included in others, and neither algorithm can be generalized to deal with a bounded total number of holes in the data. In this paper, we give the first practical algorithms for the Haplotyping Problem, based on Dynamic Programming. Our algorithms do not require the fragments to not include each other, and are polynomial for each constant k bounding the total number of holes in the data. For m SNPs and n fragments, we give an O(mn^{2k+2}) algorithm for the MSR problem, and an O(2^{2k} m^2 n+2^{3k} m^3) algorithm for the MFR problem, when each fragment has at most k holes. In particular, we obtain an O(mn^2) algorithm for MSR and an O(m^2 n+m^3) algorithm for MFR on gapless fragments. Finally, we prove that both MFR and MSR are APX-hard in general

    An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

    Get PDF
    corecore