48 research outputs found

    Proportions of sCSVs from each population observed on a held out haplotype.

    No full text
    <p>Each row represents the ancestry of the haplotype that was held out and each column represents the average number of sCSVs observed on the held out haplotype from the given population. Each row is normalized by the maximum value of the row so that the population with the most sCSVs observed has a value of 1. In each row, higher values are associated with populations in the same continental group as would be expected. The IBS have only fourteen individuals, which makes determining IBS sCSVs extremely difficult.</p

    The average number of sCSVs from each 1000 Genomes population observed per megabase on the African-African called local ancestry regions of the real ASW individuals on chromosome 10.

    No full text
    <p>The large number of YRI sCSVs seen in these regions supports the hypothesis that the African admixture component in African Americans comes from western Africa. We plot the expected number of observed sCSVs per megabase on a YRI haplotype (red diamonds) and the expected number of observed sCSVs on an LWK haplotype (green squares). The observed counts more closely resemble the count profile expected from the YRI haplotypes.</p

    The transition probabilities between ancestry pairs.

    No full text
    <p>If A<sub>k</sub> represent a specific ancestry and θ<sub>k</sub> represents the admixture proportion of that ancestry in the admixed population, then these equations are the transition probabilities for all possible types of transitions given a probability r<i><sub>j</sub></i> of one or more recombinations occurring between the previous informative CSV and the <i>j</i><sup>th</sup> informative CSV. The rows represent the ancestry state at the previous CSV and the columns the ancestry state being transitioned into at the <i>j<sup>th</sup></i> CSV.</p

    Example of CSVs in a 2-way admixed individual (e.g. African American).

    No full text
    <p>Lines denote the true local ancestry while the dots denote CSVs. Different dot types denote the continental ancestry of each CSV. From visual inspection it is relatively easy to discern the true ancestry from the three observed patterns. Spurious CSVs are denoted by CSVs mislabeling the true ancestry state.</p

    Local ancestry inference accuracy in three simulated populations.

    No full text
    <p>“Array data” denotes that a method was run only on the variants present on the Illumina 1 M genotyping array. “Full genome” denotes methods were run using all the variants. RFMix requires phased haplotype input, which was infered using Beagle; all other methods received unphased genotype data as input. Correlation values are the mean squared correlation across SNPs of the true vs. inferred ancestry across individuals. LAMP-LD and MULTIMIX were optimized to run with genotyping array data, possibly explaining the steep drop in accuracy when they are run using full sequencing data. MULTIMIX is not plotted when run on full sequencing data because it performed very poorly, possibly due to inaccurate parameters for sequencing data. Haploid and diploid errors are reported in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003555#pcbi-1003555-t002" target="_blank">Table 2</a>.</p

    The average number of observed CSVs per haplotype per megabase from each ancestry.

    No full text
    <p>Parentheses are the percentages of CSVs on each haplotype and the standard deviations. To estimate CSVs we used TSI, LWK, and JPT individuals as proxies for the European, African and Native American ancestries. We calculated the number of European, African and Asian CSVs seen on CEU, YRI, and CHS+CHB haplotypes. The values in parentheses represent the percentages of each ancestry type of CSV seen on a haplotype from a specific population.</p

    Accuracy as a function of sequencing coverage.

    No full text
    <p>African-Americans with only two distinct ancestral populations increases fastest in accuracy.</p

    Local ancestry accuracy in simulations of African Americans, Mexicans and Puerto Ricans.

    No full text
    <p>Accuracy is reported as mean r<sup>2</sup> (haploid accuracy, diploid accuracy). “Array data” denotes that a method was run only on the variants present on the Illumina 1 M genotyping array. “Full genome” denotes methods were run using all the variants. RFMix requires phased haplotype input that was phased using Beagle; all other methods received unphased genotype data as input. Correlation values are the mean squared correlation across SNPs of the true vs. inferred ancestry across individuals. Accuracy is reported as mean r<sup>2</sup> (haploid accuracy, diploid accuracy). LAMP-LD and MULTIMIX were optimized to run with genotyping array data, possibly explaining the steep drop in accuracy when they are run using full sequencing data.</p

    sCSVs are able to assign the correct continental group to small haplotype segments with high accuracy.

    No full text
    <p>This shows most of the incorrectly called accuracies still call to the correct continental group.</p

    The average number of sCSVs from each 1000 Genomes population observed on the European-European called local ancestry regions of the real ASW individuals.

    No full text
    <p>The average number of sCSVs from each 1000 Genomes population observed on the European-European called local ancestry regions of the real ASW individuals.</p
    corecore