Article thumbnail
Location of Repository

Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset

By Ronald M Adkins


BACKGROUND: Analyses of genetic data at the level of haplotypes provide increased accuracy and power to infer genotype-phenotype correlations and evolutionary history of a locus. However, empirical determination of haplotypes is expensive and laborious. Therefore, several methods of inferring haplotypes from unphased genotypic data have been proposed, but it is unclear how accurate each of the methods is or which methods are superior. The accuracy of some of the leading methods of computational haplotype inference (PL-EM, Phase, SNPHAP, Haplotyper) are compared using a large set of 308 empirically determined haplotypes based on 15 SNPs, among which 36 haplotypes were observed to occur. This study presents several advantages over many previous comparisons of haplotype inference methods: a large number of subjects are included, the number of known haplotypes is much smaller than the number of chromosomes surveyed, a range in values of linkage disequilibrium, presence of rare SNP alleles, and considerable dispersion in the frequencies of haplotypes. RESULTS: In contrast to some previous comparisons of haplotype inference methods, there was very little difference in the accuracy of the various methods in terms of either assignment of haplotypes to individuals or estimation of haplotype frequencies. Although none of the methods inferred all of the known haplotypes, the assignment of haplotypes to subjects was about 90% correct for individuals heterozygous for up to three SNPs and was about 80% correct for up to five heterozygous sites. All of the methods identified every haplotype with a frequency above 1%, and none assigned a frequency above 1% to an incorrect haplotype. CONCLUSIONS: All of the methods of haplotype inference have high accuracy and one can have confidence in inferences made by any one of the methods. The ability to identify even rare (≥ 1%) haplotypes is reassuring for efforts to identify haplotypes that contribute to disease in a significant proportion of a population. Assignment of haplotypes is relatively accurate among subjects heterozygous for up to 5 sites, and this might be the largest number of SNPs for which one should define haplotype blocks or have confidence in haplotype assignments

Topics: Research Article
Publisher: BioMed Central
Year: 2004
DOI identifier: 10.1186/1471-2156-5-22
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (1997). 3rd, Mullis PE: Allelic variations in the human growth hormone-1 gene promoter of growth hormone-deficient patients and normal controls.
  2. (2003). A comparison of bayesian methods for haplotype reconstruction.
  3. (2001). A new statistical method for haplotype reconstruction from population data.
  4. (2000). A: The predictive power of haplotypes in clinical response. Pharmacogenomics
  5. (2001). AG: Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance. Science
  6. (2002). Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.
  7. (2001). Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data.
  8. (1999). DN: Evolution of the proximal promoter region of the mammalian growth hormone gene. Gene
  9. (2003). DN: Human growth hormone 1 (GH1) gene expression: complex haplotypedependent influence of polymorphic variation in the proximal promoter and locus control region. Hum Mutat
  10. (2000). Excoffier L: Arlequin ver 2.000.
  11. (2002). IJ: Effectiveness of computational methods in haplotype prediction. Hum Genet
  12. (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol
  13. (2000). JM: SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease.
  14. (1998). KK: A global haplotype analysis of the myotonic dystrophy locus: implications for the evolution of modern humans and for the origin of myotonic dystrophy mutations.
  15. (2000). Liggett SB: Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness.
  16. (1995). Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol
  17. (1997). Momigliano Richiardi P: Evidence for gene conversion in the generation of extensive polymorphism in the promoter of the growth hormone gene. Hum Genet
  18. (2002). Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms.
  19. (1999). Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet
  20. (1988). S: A cladistic analysis of phenotype associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations. Genetics
  21. (2000). Schork NJ: Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.
  22. (2001). Schork NJ: Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. Genome Res
  23. (1989). Seeburg PH: The human growth hormone locus: nucleotide sequence, biology, and evolution. Genomics
  24. SNPHAP: a program for estimating frequencies of large haplotypes of SNPs
  25. (1964). The interaction of selection and linkage. I. General considerations; heterotic models. Genetics
  26. (2001). Zoelen EJ: Promoter haplotype combinations of the platelet-derived growth factor alpha-receptor gene predispose to human neural tube defects. Nat Genet

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.