Article thumbnail

A Class Representative Model for Pure Parsimony Haplotyping under Uncertain Data

By Daniele Catanzaro, Martine Labbé and Luciano Porretta


The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al. [1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles


  1. (2009). A class representative model for pure parsimony haplotyping.
  2. (2006). A comparison of phasing algorithms for trios and unrelated individuals.
  3. (2001). A frameshift mutation in NOD2 associated with susceptibility to crohn’s disease.
  4. (2005). A haplotype map of the human genome.
  5. (1984). A polymorphic locus near the human insulin gene is associated with insulin-dependent diabetes mellitus.
  6. (2006). A polynomial solution to a special case of the parsimony haplotyping problem.
  7. (1998). A Pro12Ala substitution in PPAR c 2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity.
  8. (2001). An apolipoprotein influencing triglycerides in humans and mice revealed by comparative sequencing.
  9. (2010). An integer programming model for hla association studies: A case study for psoriasis and severe alopecia areata. Human Immunology In
  10. (2002). and susceptibility to schizophrenia.
  11. (1996). Apolipoprotein E and Alzheimer’s disease. Annual Reviews -
  12. (2001). Association of NOD2 leucine-rich repeat variants with susceptibility to crohn’s disease.
  13. (2002). Association of the ADAM33 gene with asthma and bronchial hyperresponsiveness.
  14. (2003). Combinatorial problems arising in SNP and haplotype analysis. In:
  15. (2002). Functional snps in the lymphotoxin-a gene that are associated with susceptibility to myocardial infarction.
  16. (1990). Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology 7: 1–44. A Class Representative Model for
  17. (2001). Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to crohn disease.
  18. (2003). Haplotype inference by maximum parsimony.
  19. (2003). Haplotype inference by pure parsimony. In:
  20. (2003). Haplotype information and linkage disequilibrium mapping for single nucleotide polymorphisms.
  21. (2004). Haplotyping populations by pure parsimony: Complexity of exact and approximate algorithms.
  22. (2001). Inference of haplotypes from samples of diploid populations: Complexity and algorithms.
  23. (2006). Integer programming approaches to haplotype inference by pure parsimony.
  24. (1991). Low nucleotide diversity in man.
  25. (2006). Models and algorithms for haplotyping problem.
  26. (1997). Resistance to activated protein C caused by the factor V R506Q mutation is a common risk factor for venous thrombosis.
  27. (2001). Statistical estimation and pedigree analysis of CCR2–CCR5 haplotypes.
  28. (2000). The common ppar c pro12ala polymorphism is associated with decreased risk of type 2 diabetes.
  29. (1996). The ctla-4 gene region of chromosome 2q33 is linked to, and associated with, type I diabetes.
  30. (2003). The gene encoding phosphodiesterase 4d confers risk of ischemic stroke.
  31. (2009). The pure parsimony haplotyping problem: Overview and computational advances.
  32. (1990). Worldwide differences in the incidence of type I diabetes are associated with amino acid variation at position 57 of the HLA-DQ b chain.