15 research outputs found

    Detecting and characterizing genomic signatures of positive selection in global populations.

    Get PDF
    Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event

    Principal component analysis (PCA) of 1,224 samples from 16 global populations.

    Get PDF
    <p>PCA of 1,224 samples from SSIP, SSMP and 14 populations from Phase 1 of the 1-coded by continents (panel A). An analysis of admixture was also performed on the 16 populations with ADMIXTURE, where the number of distinct populations (<i>K</i>) was allowed to vary between 2 and 8 (panel B). The black window highlights the position of the SSIP samples on the admixture plot.</p

    Principal component analysis (PCA) of SSIP samples with 132 South Asians.

    No full text
    <p>PCA of 36 SSIP samples with 132 South Asian samples from 25 well-defined Indian groups by Reich and colleagues <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004377#pgen.1004377-Reich3" target="_blank">[44]</a> using 202,600 SNPs that were present in both databases (panel A). Five groups corresponding to Great Andamanese, Onge, Nyshi, Aonaga and Siddi were subsequently removed, leaving 104 samples from 20 Indian groups to be analyzed in a second PCA, where the samples were first assigned a color according to their group memberships (panel B), and second by the latitude of origin into North and South Indians (panel C, see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004377#pgen.1004377.s018" target="_blank">Table S2</a> for the classification of North and South Indians). The color assignments in panels A and B are represented by the color legend on the bottom left of the figure.</p

    Size distribution and novelty of variants in SSIP.

    No full text
    <p>Autosomal variants identified in the 36 SSIP samples, which included single nucleotide polymorphisms (SNPs), small insertion/deletions (indels) between 2 bp to 50 bp, and large deletions between 51 bp to 1 Mb. The SSIP SNPs and indels are defined as novel if they are not present in SSMP and dbSNP137, whereas dbSNP132 was used for defining the novelty of the 1 KGP SNPs and indels. The novelty of large deletions in SSIP and 1 KGP is defined with respect to SSMP and DGV release 2013-07-23.</p

    Assessing intra-population diversity between the samples.

    No full text
    <p>The extent of SNP sharing between every pair of samples in a population can be measured with a distance measure <i>D</i> that is scaled between 0 and 1 (vertical axis), where a higher value indicates a greater extent of heterogeneity in SNP content (or a lower degree of SNP sharing) between two samples. All possible pairwise measurements of <i>D</i> in each population are represented in a boxplot, where the ends of the whiskers indicate the minimum and maximum distances between specific pairs of samples in that population, the edges of the box indicates the 1<sup>st</sup> and 3<sup>rd</sup> quartiles, and the horizontal line in the box represents the median pairwise distance. The groups are colored with respect to the four continents (Americas – maroon; Africans – red; Asians – green; Europeans – blue). Each label on the horizontal axis indicates the continent label, population label, number of samples and total number of sample pairs of the population.</p
    corecore