163 research outputs found

    Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations

    Get PDF
    When a selective sweep occurs in the chromosomal region around a target gene in two populations that have recently separated, it produces three dramatic genomic consequences: 1) decreased multi-locus heterozygosity in the region; 2) elevated or diminished genetic divergence (FST) of multiple polymorphic variants adjacent to the selected locus between the divergent populations, due to the alternative fixation of alleles; and 3) a consequent regional increase in the variance of FST (S2FST) for the same clustered variants, due to the increased alternative fixation of alleles in the loci surrounding the selection target. In the first part of our study, to search for potential targets of directional selection, we developed and validated a resampling-based computational approach; we then scanned an array of 31 different-sized moving windows of SNP variants (5–65 SNPs) across the human genome in a set of European and African American population samples with 183,997 SNP loci after correcting for the recombination rate variation. The analysis revealed 180 regions of recent selection with very strong evidence in either population or both. In the second part of our study, we compared the newly discovered putative regions to those sites previously postulated in the literature, using methods based on inspecting patterns of linkage disequilibrium, population divergence and other methodologies. The newly found regions were cross-validated with those found in nine other studies that have searched for selection signals. Our study was replicated especially well in those regions confirmed by three or more studies. These validated regions were independently verified, using a combination of different methods and different databases in other studies, and should include fewer false positives. The main strength of our analysis method compared to others is that it does not require dense genotyping and therefore can be used with data from population-based genome SNP scans from smaller studies of humans or other species

    Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

    Get PDF
    Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (Me) for the adjustment of multiple testing, but current methods of calculation for Me are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate Me. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the Me, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10−7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10−8 for current or merged commercial genotyping arrays, ~10−8 for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10−8 for the common SNPs only within genes

    Fine-scale detection of population-specific linkage disequilibrium using haplotype entropy in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The creation of a coherent genomic map of recent selection is one of the greatest challenges towards a better understanding of human evolution and the identification of functional genetic variants. Several methods have been proposed to detect linkage disequilibrium (LD), which is indicative of natural selection, from genome-wide profiles of common genetic variations but are designed for large regions.</p> <p>Results</p> <p>To find population-specific LD within small regions, we have devised an entropy-based method that utilizes differences in haplotype frequency between populations. The method has the advantages of incorporating multilocus association, conciliation with low allele frequencies, and independence from allele polarity, which are ideal for short haplotype analysis. The comparison of HapMap SNPs data from African and Caucasian populations with a median resolution size of ~23 kb gave us novel candidates as well as known selection targets. Enrichment analysis for the yielded genes showed associations with diverse diseases such as cardiovascular, immunological, neurological, and skeletal and muscular diseases. A possible scenario for a selective force is discussed. In addition, we have developed a web interface (ENIGMA, available at <url>http://gibk21.bse.kyutech.ac.jp/ENIGMA/index.html</url>), which allows researchers to query their regions of interest for population-specific LD.</p> <p>Conclusion</p> <p>The haplotype entropy method is powerful for detecting population-specific LD embedded in short regions and should contribute to further studies aiming to decipher the evolutionary histories of modern humans.</p

    Genome-Wide Association Studies of the PR Interval in African Americans

    Get PDF
    The PR interval on the electrocardiogram reflects atrial and atrioventricular nodal conduction time. The PR interval is heritable, provides important information about arrhythmia risk, and has been suggested to differ among human races. Genome-wide association (GWA) studies have identified common genetic determinants of the PR interval in individuals of European and Asian ancestry, but there is a general paucity of GWA studies in individuals of African ancestry. We performed GWA studies in African American individuals from four cohorts (n = 6,247) to identify genetic variants associated with PR interval duration. Genotyping was performed using the Affymetrix 6.0 microarray. Imputation was performed for 2.8 million single nucleotide polymorphisms (SNPs) using combined YRI and CEU HapMap phase II panels. We observed a strong signal (rs3922844) within the gene encoding the cardiac sodium channel (SCN5A) with genome-wide significant association (p<2.5×10−8) in two of the four cohorts and in the meta-analysis. The signal explained 2% of PR interval variability in African Americans (beta  = 5.1 msec per minor allele, 95% CI  = 4.1–6.1, p = 3×10−23). This SNP was also associated with PR interval (beta = 2.4 msec per minor allele, 95% CI = 1.8–3.0, p = 3×10−16) in individuals of European ancestry (n = 14,042), but with a smaller effect size (p for heterogeneity <0.001) and variability explained (0.5%). Further meta-analysis of the four cohorts identified genome-wide significant associations with SNPs in SCN10A (rs6798015), MEIS1 (rs10865355), and TBX5 (rs7312625) that were highly correlated with SNPs identified in European and Asian GWA studies. African ancestry was associated with increased PR duration (13.3 msec, p = 0.009) in one but not the other three cohorts. Our findings demonstrate the relevance of common variants to African Americans at four loci previously associated with PR interval in European and Asian samples and identify an association signal at one of these loci that is more strongly associated with PR interval in African Americans than in Europeans

    Microarray-Based Maps of Copy-Number Variant Regions in European and Sub-Saharan Populations

    Get PDF
    The genetic basis of phenotypic variation can be partially explained by the presence of copy-number variations (CNVs). Currently available methods for CNV assessment include high-density single-nucleotide polymorphism (SNP) microarrays that have become an indispensable tool in genome-wide association studies (GWAS). However, insufficient concordance rates between different CNV assessment methods call for cautious interpretation of results from CNV-based genetic association studies. Here we provide a cross-population, microarray-based map of copy-number variant regions (CNVRs) to enable reliable interpretation of CNV association findings. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to scan the genomes of 1167 individuals from two ethnically distinct populations (Europe, N = 717; Rwanda, N = 450). Three different CNV-finding algorithms were tested and compared for sensitivity, specificity, and feasibility. Two algorithms were subsequently used to construct CNVR maps, which were also validated by processing subsamples with additional microarray platforms (Illumina 1M-Duo BeadChip, Nimblegen 385K aCGH array) and by comparing our data with publicly available information. Both algorithms detected a total of 42669 CNVs, 74% of which clustered in 385 CNVRs of a cross-population map. These CNVRs overlap with 862 annotated genes and account for approximately 3.3% of the haploid human genome

    Parallel Adaptive Divergence among Geographically Diverse Human Populations

    Get PDF
    Few genetic differences between human populations conform to the classic model of positive selection, in which a newly arisen mutation rapidly approaches fixation in one lineage, suggesting that adaptation more commonly occurs via moderate changes in standing variation at many loci. Detecting and characterizing this type of complex selection requires integrating individually ambiguous signatures across genomically and geographically extensive data. Here, we develop a novel approach to test the hypothesis that selection has favored modest divergence at particular loci multiple times in independent human populations. We find an excess of SNPs showing non-neutral parallel divergence, enriched for genic and nonsynonymous polymorphisms in genes encompassing diverse and often disease related functions. Repeated parallel evolution in the same direction suggests common selective pressures in disparate habitats. We test our method with extensive coalescent simulations and show that it is robust to a wide range of demographic events. Our results demonstrate phylogenetically orthogonal patterns of local adaptation caused by subtle shifts at many widespread polymorphisms that likely underlie substantial phenotypic diversity

    Patterns of Ancestry, Signatures of Natural Selection, and Genetic Association with Stature in Western African Pygmies

    Get PDF
    African Pygmy groups show a distinctive pattern of phenotypic variation, including short stature, which is thought to reflect past adaptation to a tropical environment. Here, we analyze Illumina 1M SNP array data in three Western Pygmy populations from Cameroon and three neighboring Bantu-speaking agricultural populations with whom they have admixed. We infer genome-wide ancestry, scan for signals of positive selection, and perform targeted genetic association with measured height variation. We identify multiple regions throughout the genome that may have played a role in adaptive evolution, many of which contain loci with roles in growth hormone, insulin, and insulin-like growth factor signaling pathways, as well as immunity and neuroendocrine signaling involved in reproduction and metabolism. The most striking results are found on chromosome 3, which harbors a cluster of selection and association signals between approximately 45 and 60 Mb. This region also includes the positional candidate genes DOCK3, which is known to be associated with height variation in Europeans, and CISH, a negative regulator of cytokine signaling known to inhibit growth hormone-stimulated STAT5 signaling. Finally, pathway analysis for genes near the strongest signals of association with height indicates enrichment for loci involved in insulin and insulin-like growth factor signaling

    Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

    Get PDF
    Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies

    Polymorphisms in genes of interleukin 12 and its receptors and their association with protection against severe malarial anaemia in children in western Kenya

    Get PDF
    Abstract Background: Malarial anaemia is characterized by destruction of malaria infected red blood cells and suppression of erythropoiesis. Interleukin 12 (IL12) significantly boosts erythropoietic responses in murine models of malarial anaemia and decreased IL12 levels are associated with severe malarial anaemia (SMA) in children. Based on the biological relevance of IL12 in malaria anaemia, the relationship between genetic polymorphisms of IL12 and its receptors and SMA was examined. Methods: Fifty-five tagging single nucleotide polymorphisms covering genes encoding two IL12 subunits, IL12A and IL12B, and its receptors, IL12RB1 and IL12RB2, were examined in a cohort of 913 children residing in Asembo Bay region of western Kenya. Results: An increasing copy number of minor variant (C) in IL12A (rs2243140) was significantly associated with a decreased risk of SMA (P = 0.006; risk ratio, 0.52 for carrying one copy of allele C and 0.28 for two copies). Individuals possessing two copies of a rare variant (C) in IL12RB1 (rs429774) also appeared to be strongly protective against SMA (P = 0.00005; risk ratio, 0.18). In addition, children homozygous for another rare allele (T) in IL12A (rs22431348) were associated with reduced risk of severe anaemia (SA) (P = 0.004; risk ratio, 0.69) and of severe anaemia with any parasitaemia (SAP) (P = 0.004; risk ratio, 0.66). In contrast, AG genotype for another variant in IL12RB1 (rs383483) was associated with susceptibility to high-density parasitaemia (HDP) (P = 0.003; risk ratio, 1.21). Conclusions: This study has shown strong associations between polymorphisms in the genes of IL12A and IL12RB1 and protection from SMA in Kenyan children, suggesting that human genetic variants of IL12 related genes may significantly contribute to the development of anaemia in malaria patients

    Impact of Selection and Demography on the Diffusion of Lactase Persistence

    Get PDF
    BACKGROUND: The lactase enzyme allows lactose digestion in fresh milk. Its activity strongly decreases after the weaning phase in most humans, but persists at a high frequency in Europe and some nomadic populations. Two hypotheses are usually proposed to explain the particular distribution of the lactase persistence phenotype. The gene-culture coevolution hypothesis supposes a nutritional advantage of lactose digestion in pastoral populations. The calcium assimilation hypothesis suggests that carriers of the lactase persistence allele(s) (LCT*P) are favoured in high-latitude regions, where sunshine is insufficient to allow accurate vitamin-D synthesis. In this work, we test the validity of these two hypotheses on a large worldwide dataset of lactase persistence frequencies by using several complementary approaches. METHODOLOGY: We first analyse the distribution of lactase persistence in various continents in relation to geographic variation, pastoralism levels, and the genetic patterns observed for other independent polymorphisms. Then we use computer simulations and a large database of archaeological dates for the introduction of domestication to explore the evolution of these frequencies in Europe according to different demographic scenarios and selection intensities. CONCLUSIONS: Our results show that gene-culture coevolution is a likely hypothesis in Africa as high LCT*P frequencies are preferentially found in pastoral populations. In Europe, we show that population history played an important role in the diffusion of lactase persistence over the continent. Moreover, selection pressure on lactase persistence has been very high in the North-western part of the continent, by contrast to the South-eastern part where genetic drift alone can explain the observed frequencies. This selection pressure increasing with latitude is highly compatible with the calcium assimilation hypothesis while the gene-culture coevolution hypothesis cannot be ruled out if a positively selected lactase gene was carried at the front of the expansion wave during the Neolithic transition in Europe
    corecore