102 research outputs found

    Efficient Haplotype Inference with Pseudo-Boolean Optimization

    No full text
    Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem. Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern Pseudo-Boolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SAT-based approach on most problem instances, being, in general, significantly more efficient

    TRIO LOGIC REGRESSION - DETECTION OF SNP - SNP INTERACTIONS IN CASE-PARENT TRIOS

    Get PDF
    Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions

    Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (<it>Solanum tuberosum</it>). Potato species are tetraploid and highly heterozygous.</p> <p>Results</p> <p>Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes.</p> <p>Conclusion</p> <p>Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at <url>http://www.gabipd.org/projects/satlotyper/</url>. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.</p

    Incremental Cardinality Constraints for MaxSAT

    Full text link
    Maximum Satisfiability (MaxSAT) is an optimization variant of the Boolean Satisfiability (SAT) problem. In general, MaxSAT algorithms perform a succession of SAT solver calls to reach an optimum solution making extensive use of cardinality constraints. Many of these algorithms are non-incremental in nature, i.e. at each iteration the formula is rebuilt and no knowledge is reused from one iteration to another. In this paper, we exploit the knowledge acquired across iterations using novel schemes to use cardinality constraints in an incremental fashion. We integrate these schemes with several MaxSAT algorithms. Our experimental results show a significant performance boost for these algo- rithms as compared to their non-incremental counterparts. These results suggest that incremental cardinality constraints could be beneficial for other constraint solving domains.Comment: 18 pages, 4 figures, 1 table. Final version published in Principles and Practice of Constraint Programming (CP) 201

    Statistical physics methods in computational biology

    Get PDF
    The interest of statistical physics for combinatorial optimization is not new, it suffices to think of a famous tool as simulated annealing. Recently, it has also resorted to statistical inference to address some "hard" optimization problems, developing a new class of message passing algorithms. Three applications to computational biology are presented in this thesis, namely: 1) Boolean networks, a model for gene regulatory networks; 2) haplotype inference, to study the genetic information present in a population; 3) clustering, a general machine learning tool

    A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

    Get PDF
    Haplotype data is especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and expensive. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved

    Pseudo-Booleanilainen optimisaatio käyttäen implisiittisiä osumisjoukkoja

    Get PDF
    There are many computationally difficult problems where the task is to find a solution with the lowest cost possible that fulfills a given set of constraints. Such problems are often NP-hard and are encountered in a variety of real-world problem domains, including planning and scheduling. NP-hard problems are often solved using a declarative approach by encoding the problem into a declarative constraint language and solving the encoding using a generic algorithm for that language. In this thesis we focus on pseudo-Boolean optimization (PBO), a special class of integer programs (IP) that only contain variables that admit the values 0 and 1. We propose a novel approach to PBO that is based on the implicit hitting set (IHS) paradigm, which uses two separate components. An IP solver is used to find an optimal solution under an incomplete set of constraints. A pseudo-Boolean satisfiability solver is used to either validate the feasibility of the solution or to extract more constraints to the integer program. The IHS-based PBO algorithm iteratively invokes the two algorithms until an optimal solution to a given PBO instance is found. In this thesis we lay out the IHS-based PBO solving approach in detail. We implement the algorithm as the PBO-IHS solver by making use of recent advances in reasoning techniques for pseudo-Boolean constraints. Through extensive empirical evaluation we show that our PBO-IHS solver outperforms other available specialized PBO solvers and has complementary performance compared to classical integer programming techniques
    • …
    corecore