27 research outputs found

    Reaching the Holy Grail of Biogeography - from Genome to Home Village

    No full text
    The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data to address this question with limited success. Biogeographical algorithms using next-generation sequencing data achieved an accuracy of 700 km in Europe but were inaccurate elsewhere. Here we develop the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three datasets using 40,000-130,000 SNPs to improve this accuracy. GPS placed 83% of worldwide-individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS places a quarter of them in their villages and most of the remaining within 50km of their villages. The accuracy and power of GPS to infer the biogeography of worldwide-individuals down to their country or, in some cases, village, of origin, underscore the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry, population genetics, history, and health. Specific applications for the pharmaceutical industry will be discussed

    An illustration of a hierarchical <i>F</i>-statistics analysis using eight populations.

    No full text
    <p>Samples are organized in a three level structure of individuals, intra-continental populations, and continental populations. The relationships between the six fixation indices are depicted on the top left and follow the formulation of Eq. S1. For example, . Below are the <i>F</i>-statistics, calculated separately for autosomes, male X-chromosomes, and females X-chromosomes. The indices measuring the genetic variation between continental populations (<i>F</i><sub>CT</sub>), between intra-continental populations (<i>F</i><sub>SC</sub>), and between individuals of intra-continental populations (<i>F</i><sub>IS</sub>) are shown in bold.</p

    Correlating MAF with <i>F</i><sub>ST</sub>.

    No full text
    <p>The mean <i>F</i><sub>ST</sub> plotted for all MAF groups (dots), excluding the rarest ones (MAF >0.05), allows us to express the correlation between the two variables using a single linear equation (Eq. 4).</p

    Map of the Old World.

    No full text
    <p>The geographical regions of origins are shown for the eight populations used in this study. Intra-continental populations have the same color.</p

    Comparing the coefficient of variation for high- and low-<i>F</i><sub>ST</sub> SNPs.

    No full text
    <p>Frequency distribution of coefficient of variation calculated between adjacent <i>F</i><sub>ST>threshold</sub> SNPs (line) and between random samples of <i>F</i><sub>ST SNPs (histogram) for five allele frequency groups (a–e).</sub></p

    Distribution of locus-specific <i>F</i><sub>ST</sub> in three continental populations (CEU+TSI, CHB+CHD+JPT, LWK+MKK+YRI).

    No full text
    <p><i>F</i><sub>ST</sub> values were obtained for (a) 2,823,367 autosomal SNPs and (b) 86,533 SNPs on the non-recombining region of the X chromosome and 1,264 SNPs on the PAR region (inset). The histograms show bin distribution as indicated on the x-axis and the cumulative distribution (line).</p

    Minor allele frequency distributions for autosomal SNPs.

    No full text
    <p>Minor allele frequency distributions for autosomal SNPs.</p

    LD for five allele frequency groups as a function of physical distance in Africans.

    No full text
    <p>LD (<i>r<sup>2</sup></i>) in African populations is plotted as a function of physical distance on a log-scale for five allele frequency groups (a–e). To simplify the presentation, the mean and standard error of the mean <i>r<sup>2</sup></i> for the <i>F</i><sub>ST >threshold</sub> SNPs (blue) and <i>F</i><sub>ST (red) are presented for different between-SNP distances (50 bp, 100 bp, 1 kb, 5 kb, 10 kb, 50 kb, 100 kb, 500 kb, and 1000 kb). <i>F</i><sub>ST>threshold</sub> SNPs are marked as green dots.</sub></p

    A summary of the supporting evidences for the two phylogenetic hypotheses (Figure 1) using seven genetic attributes as selection criteria.

    No full text
    <p>A summary of the supporting evidences for the two phylogenetic hypotheses (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003925#pcbi-1003925-g001" target="_blank">Figure 1</a>) using seven genetic attributes as selection criteria.</p

    The cumulative distribution of homogeneous domain lengths in log scale.

    No full text
    <p>For simplicity, the mean distributions of primates, murids, and laurasiatherians are shown. In the inset, the majority of the domains of medium-short length.</p
    corecore