Haplotype Variety Analysis of Human Populations: an Application to HapMap Data

Abstract

We undertake a study to investigate the haplotype variety of distinct human populations. We use a natural measure of haplotype variety, the total number of haplotypes (TNH) present that reflects the number of haplotypes with nonzero frequencies estimated from the data at hand for each selection of multiple loci. For the analysis of real human populations, we use the haplotype data of the Denver Chinese, Tuscan Italians, Luhya Kenyans, and Gujarati Indians from release III of the HapMap database. Moreover, we show that the TNH statistic is biased in small sample data scenarios such as the HapMap and implement a nested simulation study to estimate and remove such bias. We perform a preliminary analysis of means and variances of the population allele frequencies in the four populations. Lastly, we implement a generalized linear model to detect and quantify the differences in haplotype structures of these populations. Our results show that all populations possess significantly different adjusted average TNH values. Our findings extend previous results based on alternative statistical approaches and demonstrate the existence of pronounced differences in the haplotype variety of the analyzed populations even after controlling for haplotype span as well as all allele frequencies and their two-way interactions

    Similar works