Genomic, patterns of selection and differentiation in African populations and implications for mapping disease association

Abstract

The main objective of this thesis is to gain a better understanding of genomic patterns of natural selection and population differentiation in Africa, where there is great genetic diversity, and of the implications for genetic mapping of complex diseases. I began by studying two neighbouring villages in eastern Sudan that are of different ethnicity, Hausa and Masalit, and that appear to have different susceptibility to malaria and visceral leishmaniasis (VL). Specifically, I investigated patterns of linkage disequilibrium (LD) and haplotypic signals of positive selection in the 5q31 genomic region which contains immune genes that have been implicated in susceptibility to malaria and VL. In my first analysis, by genotyping 34 single nucleotide polymorphisms (SNPs) in the 5q31 region, I did not find signals of selection or population differentiation between the Hausa and Masalit using available statistical methods. I conceived the idea that patterns of LD might provide a more sensitive test of population differentiation, and I developed an approach for this using permutation analysis. This method revealed differentiation between the Hausa, the Masalit and other African ethnic groups. To better understand signals of selection, I next studied a region of the genome associated with a known malaria resistance factor, the haemoglobin S (HbS) variant of the HBB gene. By genotyping 26 SNPs in the region of the HBB gene, I observed a haplotype that extended in excess of 1 Mb, despite being at high frequency and spanning several recombinational hotspots. This long haplotype carried the HbS allele but, importantly, it could be readily detected without typing the HbS variant. Building on this observation, I designed a new method to screen the whole genome for long haplotypes that might be signals of selection, and developed a software programme to implement this method. I validated this method using haplotypic data for the Yoruba generated by the HapMap project and complemented by additional SNP data that I generated on HapMap cell lines, and found that the HbS allele resides on a haplotype that extends to 1.2 Mb, and is at strikingly high frequency compared to other haplotypes of similar length on the same chromosome. Next I applied this method to a large family-based association study of severe malaria in The Gambia, and identified several novel genomic regions with unusually long haplotypes of high frequency. These included a number of regions that may be associated with resistance to severe malaria, and which merit further investigation

    Similar works