11 research outputs found
On the Complexity of the Single Individual SNP Haplotyping Problem
We present several new results pertaining to haplotyping. These results
concern the combinatorial problem of reconstructing haplotypes from incomplete
and/or imperfectly sequenced haplotype fragments. We consider the complexity of
the problems Minimum Error Correction (MEC) and Longest Haplotype
Reconstruction (LHR) for different restrictions on the input data.
Specifically, we look at the gapless case, where every row of the input
corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most
one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap
case and still NP-hard in the gapless case. In addition, we question earlier
claims that MEC is NP-hard even when the input matrix is restricted to being
completely binary. Concerning LHR, we show that this problem is NP-hard and
APX-hard in the 1-gap case (and thus also in the general case), but is
polynomial time solvable in the gapless case.Comment: 26 pages. Related to the WABI2005 paper, "On the Complexity of
Several Haplotyping Problems", but with more/different results. This papers
has just been submitted to the IEEE/ACM Transactions on Computational Biology
and Bioinformatics and we are awaiting a decision on acceptance. It differs
from the mid-August version of this paper because here we prove that 1-gap
LHR is APX-hard. (In the earlier version of the paper we could prove only
that it was NP-hard.
Linear Time Parameterized Algorithms via Skew-Symmetric Multicuts
A skew-symmetric graph is a directed graph with an
involution on the set of vertices and arcs. In this paper, we
introduce a separation problem, -Skew-Symmetric Multicut, where we are given
a skew-symmetric graph , a family of of -sized subsets of
vertices and an integer . The objective is to decide if there is a set
of arcs such that every set in the family has a vertex
such that and are in different connected components of
. In this paper, we give an algorithm for
this problem which runs in time , where is the
number of arcs in the graph, the number of vertices and the length
of the family given in the input.
Using our algorithm, we show that Almost 2-SAT has an algorithm with running
time and we obtain algorithms for {\sc Odd Cycle Transversal}
and {\sc Edge Bipartization} which run in time and
respectively. This resolves an open problem posed by Reed,
Smith and Vetta [Operations Research Letters, 2003] and improves upon the
earlier almost linear time algorithm of Kawarabayashi and Reed [SODA, 2010].
We also show that Deletion q-Horn Backdoor Set Detection is a special case of
3-Skew-Symmetric Multicut, giving us an algorithm for Deletion q-Horn Backdoor
Set Detection which runs in time . This gives the first
fixed-parameter tractable algorithm for this problem answering a question posed
in a paper by a superset of the authors [STACS, 2013]. Using this result, we
get an algorithm for Satisfiability which runs in time where
is the size of the smallest q-Horn deletion backdoor set, with being
the length of the input formula
A new workflow of fetal DNA prediction from cell-free DNA in maternal plasma
Prediction of fetal DNA allows diagnosing known/passed mutations before child’s birth. Public health significance of such early testing is that it can reassure parents who have negative results and offers timely information for those with abnormal results.
My dissertation work presents a new approach of reconstructing fetal DNA from maternal plasma. The method works because plasma from pregnant women, which contains “cell-free DNA”, has been noted to contain fetal DNA as well as maternal DNA. I developed and tested a workflow that implements my suggested approach. The workflow was broken into several parts, each fully documented in this dissertation. Each step we have taken was supported with explanation of the logic driving the step. The approach works through the examination of sequencing data sets generated by short-read sequencing (also known as next-generation sequencing), by calling variation (single nucleotide polymorphisms, or SNPs) within those samples vis-à-vis a reference sequence. I developed and introduced a series of quality control criteria applied to SNPs to improve overall prediction. A novel single individual haplotyping method was developed and applied to haplotype the parental samples. The obtained parental haplotypes were incorporated into the workflow and along with parental genotypes were used to find transmitted haplotypes in the maternal plasma. The predicted haplotypes were then aligned to each other to obtain phased SNPs. For evaluation, I compared fetal SNPs predicted by my method against control fetal SNPs (from sequencing of fetal DNA). Overall prediction power is discussed. Possible ways of improvements that should affect the overall prediction are also described
Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem
Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation, of foremost importance for a variety of applications including medical diagnostic, phylogenies and drug design.
The complete SNPs sequence information from each of the two copies of a given chromosome in a diploid genome is called a haplotype. The Haplotyping Problem for a single individual is as follows: Given a set of fragments from one individual\u2019s DNA, find a maximally consistent pair of SNPs haplotypes (one per chromosome copy) by removing data \u201cerrors\u201d related to sequencing errors, repeats, and paralogous recruitment. Two versions of the problem, i.e. the Minimum Fragment Removal (MFR) and the Minimum SNP Removal (MSR), are considered.
The Haplotyping Problem was introduced in [8], where it was proved that both MSR and MFR are polynomially solvable when each fragment covers a set of consecutive SNPs (i.e., it is a gapless fragment), and NP-hard in general. The original algorithms of [8] are of theoretical interest, but by no means practical. In fact, one relies on finding the maximum stable set in a perfect graph, and the other is a reduction to a network flow problem. Furthermore, the reduction does not work when there are fragments completely included in others, and neither algorithm can be generalized to deal with a bounded total number of holes in the data. In this paper, we give the first practical algorithms for the Haplotyping Problem, based on Dynamic Programming. Our algorithms do not require the fragments to not include each other, and are polynomial for each constant k bounding the total number of holes in the data. For m SNPs and n fragments, we give an O(mn^{2k+2}) algorithm for the MSR problem, and an O(2^{2k} m^2 n+2^{3k} m^3) algorithm for the MFR problem, when each fragment has at most k holes. In particular, we obtain an O(mn^2) algorithm for MSR and an O(m^2 n+m^3) algorithm for MFR on gapless fragments.
Finally, we prove that both MFR and MSR are APX-hard in general