2 research outputs found

    Graph algorithms for the haplotyping problem

    Get PDF
    Evidence from investigations of genetic differences among human beings shows that genetic diseases are often the results of genetic mutations. The most common form of these mutations is single nucleotide polymorphism (SNP). A complete map of all SNPs in the human genome will be extremely valuable for studying the relationships between specific haplotypes and specific genetic diseases. Some recent discoveries show that the DNA sequence of human beings can be partitioned into long blocks where genetic recombination has been rare. Then, inferring both haplotypes from chromosome sequences is a biologically meaningful research topic, which has compounded mathematical and computational problems.;We are interested in the algorithmic implications to infer haplotypes from long blocks of DNA that have not undergone recombination in populations. The assumption justifies a model of haplotype evolution---haplotypes in a population evolves along a coalescent, based on the standard population-genetic assumption of infinite sites, which as a rooted tree is a perfect phylogeny. The Perfect Phylogeny Haplotyping (PPH) Problem was introduced by Daniel Gusfield in 2002. A nearly linear-time solution to the PPH problem (O( nmalpha(nm)), where alpha is the extremely slowly growing inverse Ackerman function) is provided. However, it is very complex and difficult to implement. So far, even the best practical solution to the PPH problem has the worst-case running time of O( nm2). D. Gusfield conjectured that a linear-time ( O(nm)) solution to the PPH problem should be possible.;We solve the conjecture of Gusfield by introducing a linear-time algorithm for the PPH problem. Different kinds of posets for haplotype matrices and genotype matrices are designed and the relationships between them are studied. Since redundant calculations can be avoided by the transitivity of partial ordering in posets, we design a linear-time (O(nm )) algorithm for the PPH problem that provides all the possible solutions from an input. The algorithm is fully implemented and the simulation shows that it is much faster than previous methods

    Fast perfect phylogeny haplotype inference

    No full text
    We address the problem of reconstructing haplotypes in a population, given a sample of genotypes and assumptions about the underlying population. The problem is of major interest in genetics because haplotypes are more informative than genotypes when it comes to searching for trait genes, but it is difficult to get them directly by sequencing. After showing that simple resolution-based inference can be terribly wrong in some natural types of population, we propose a different combinatorial approach exploiting intersections of sampled genotypes (considered as sets of candidate haplotypes). For populations with perfect phylogeny we obtain an inference algorithm which is both sound and efficient. It yields with high propability the complete set of haplotypes showing up in the sample, for a sample size close to the trivial lower bound. The perfect phylogeny assumption is often justified, but we also believe that the ideas can be further extended to populations obeying relaxed structural assumptions. The ideas are quite different from other existing practical algorithms for the problem
    corecore