1,353 research outputs found

    Efficient Haplotype Inference with Pseudo-Boolean Optimization

    No full text
    Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem. Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern Pseudo-Boolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SAT-based approach on most problem instances, being, in general, significantly more efficient

    A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

    Get PDF
    Haplotype data is especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and expensive. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved

    Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (<it>Solanum tuberosum</it>). Potato species are tetraploid and highly heterozygous.</p> <p>Results</p> <p>Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes.</p> <p>Conclusion</p> <p>Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at <url>http://www.gabipd.org/projects/satlotyper/</url>. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.</p

    Boosting Haplotype Inference with Local Search

    No full text
    Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is known as haplotype inference by pure parsimony (HIPP). The HIPP problem is computationally difficult, being NP-hard. Recently, a SAT-based method (SHIPs) has been proposed to solve the HIPP problem. This method iteratively considers an increasing number of haplotypes, starting from an initial lower bound. Hence, one important aspect of SHIPs is the lower bounding procedure, which reduces the number of iterations of the basic algorithm, and also indirectly simplifies the resulting SAT model. This paper describes the use of local search to improve existing lower bounding procedures. The new lower bounding procedure is guaranteed to be as tight as the existing procedures. In practice the new procedure is in most cases considerably tighter, allowing significant improvement of performance on challenging problem instances.

    Pure Parsimony Xor Haplotyping

    Full text link
    The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given SNP. Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project

    A Column Generation Approach for Pure Parsimony Haplotyping

    Get PDF

    Origin and genetic diversity of diploid parthenogenetic Artemia in Eurasia

    Get PDF
    There is wide interest in understanding how genetic diversity is generated and maintained in parthenogenetic lineages, as it will help clarify the debate of the evolution and maintenance of sexual reproduction. There are three mechanisms that can be responsible for the generation of genetic diversity of parthenogenetic lineages: contagious parthenogenesis, repeated hybridization and microorganism infections (e.g. Wolbachia). Brine shrimps of the genus Artemia (Crustacea, Branchiopoda, Anostraca) are a good model system to investigate evolutionary transitions between reproductive systems as they include sexual species and lineages of obligate parthenogenetic populations of different ploidy level, which often co-occur. Diploid parthenogenetic lineages produce occasional fully functional rare males, interspecific hybridization is known to occur, but the mechanisms of origin of asexual lineages are not completely understood. Here we sequenced and analysed fragments of one mitochondrial and two nuclear genes from an extensive set of populations of diploid parthenogenetic Artemia and sexual species from Central and East Asia to investigate the evolutionary origin of diploid parthenogenetic Artemia, and geographic origin of the parental taxa. Our results indicate that there are at least two, possibly three independent and recent maternal origins of parthenogenetic lineages, related to A. urmiana and Artemia sp. from Kazakhstan, but that the nuclear genes are very closely related in all the sexual species and parthenogegetic lineages except for A. sinica, who presumable took no part on the origin of diploid parthenogenetic strains. Our data cannot rule out either hybridization between any of the very closely related Asiatic sexual species or rare events of contagious parthenogenesis via rare males as the contributing mechanisms to the generation of genetic diversity in diploid parthenogenetic Artemia lineages

    G-lattices for an Unrooted Perfect Phylogeny

    Get PDF
    We look at the Pure Parsimony problem and the Perfect Phylogeny Haplotyping problem. From the Pure Parsimony problem we consider structures of genotypes called g-lattices. These structures either provide solutions or give bounds to the pure parsimony problem. In particular, we investigate which of these structures supports an unrooted perfect phylogeny, a condition that adds biological interpretation. By understanding which g-lattices support an unrooted perfect phylogeny, we connect two of the standard biological inference rules used to recreate how genetic diversity propagates across generations
    • …
    corecore