1 research outputs found
Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers
The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been
highly successful in providing a sound combinatorial formulation for the
important problem of genotype phasing on pedigrees. Despite several algorithmic
advances and refinements that led to some efficient algorithms, its
applicability to real datasets has been limited by the absence of some
important characteristics of these data in its formulation, such as mutations,
genotyping errors, and missing data.
In this work, we propose the Haplotype Configuration with Recombinations and
Errors problem (HCRE), which generalizes the original MRHC formulation by
incorporating the two most common characteristics of real data: errors and
missing genotypes (including untyped individuals). Although HCRE is
computationally hard, we propose an exact algorithm for the problem based on a
reduction to the well-known Satisfiability problem. Our reduction exploits
recent progresses in the constraint programming literature and, combined with
the use of state-of-the-art SAT solvers, provides a practical solution for the
HCRE problem. Biological soundness of the phasing model and effectiveness (on
both accuracy and performance) of the algorithm are experimentally demonstrated
under several simulated scenarios and on a real dairy cattle population.Comment: 14 pages, 1 figure, 4 tables, the associated software reHCstar is
available at http://www.algolab.eu/reHCsta