24 research outputs found

    Transcriptional strand bias in mutational signatures.

    No full text
    <p>We plot the log of the ratio of <i>f</i><sub><i>2</i></sub> mutations occurring on the untranscribed versus transcribed strand. Therefore a positive value indicates that the C>T mutation is more common than the G>A mutation on the untranscribed (i.e. coding) strand. P values in brackets are, respectively, ANOVA P-values for a difference between regions and t-test P-values for a difference between <b>i)</b> West Eurasia and other regions (excluding South Asia) in A&B <b>ii)</b> 11American samples with high rates of signature 2 mutations and other regions in C&D. <b>A</b>: Boxplot of per-individual strand bias for mutations in signature 1 (TCT>T, TCC>T, CCC>T and ACC>T). One sample (S_Mayan-2) with an extreme value (0.48) is not shown. <b>B</b>: Population-level means for each of the mutations comprising signature 1. <b>C,D</b>: as A&B but for signature 2. We separated out the 11 American samples with high rates of signature 2 mutations.</p

    Dependence of signatures on genomic features.

    No full text
    <p><b>A,B</b>: dependence on conservation, measured by <i>B</i> statistic (0 = lowest <i>B</i> statistic; highest conservation). <b>A</b>: Comparison of proportions of signature 1 <i>f</i><sub><i>2</i></sub> mutations between West Eurasia and other populations (excluding South Asia). <b>B</b>: Comparison of proportions of signature 2 <i>f</i><sub><i>2</i></sub> mutations between the 11 American samples with the highest proportions, and all other samples. <b>C,D</b>: As A&B, but showing dependence on recombination rate decile computed in 1kb bins.</p

    Algorithm and model for haplotypes.

    No full text
    <p><b>A</b>: Algorithm for detecting haplotypes. For each variant in the sample (green), we scan left and right until we find inconsistent homozygote genotypes (red), record the physical and genetic length of this region (blue), and the number of singletons (purple). <b>B</b>: Model for haplotype age . Consider the 4 chromosomes (grey) of the two individuals sharing an haplotype (blue). We model the total genetic length of the inferred haplotype, , as the sum of the true genetic length and an error . Similarly, we model the number of singletons as the sum of the number on the shared chromosome () and the number on the unshared chromosomes, . We ignore the fact that we overestimate and therefore that some of the singletons might lie in the unshared part of the chromosome.</p

    Distribution and characterization of signatures 1 and 2 for <i>f</i><sub><i>2</i></sub> variants.

    No full text
    <p><b>A</b>: Factor coefficients for these two signatures, for 300 individual samples colored by region. <b>B</b>: Geographic representation of the factor loadings from panel <b>A</b>. Darker colors represent higher loadings. <b>C</b>: Characterization of the signatures in terms of mutation intensity for each of 96 possible classes. Bars are scaled by the frequency of each trinucleotide in the human reference genome. Below, the most highly correlated signatures from the COSMIC database are shown for comparison.</p

    Details of signature 1.

    No full text
    <p>A: The proportion of variants that are in signature 1 for <i>f</i><sub><i>2</i></sub> variants on the x-axis, and all variants per-genome on the y-axis. Samples in panel B, processed in a different pipeline, shown as triangles. B: Proportion of mutations in signature 1 as a function of derived allele count from 1 to 30. C: Signature 1, corrected to be robust to ancient DNA damage (Methods), for <i>f</i><sub><i>2</i></sub> variants in the SGDP and five high coverage ancient genomes. Solid lines show 5–95% bootstrap quantiles.</p

    Short descriptions of the 1000 Genomes populations.

    No full text
    <p>Short descriptions of the 1000 Genomes populations.</p

    Signatures 1 and 2 in the 1000 Genomes.

    No full text
    <p><b>A</b>: Proportions of <i>f</i><sub><i>2</i></sub> and <i>f</i><sub><i>3</i></sub> variants in signature 1 (here defined as TCT>T, TCC>T, CCC>T and ACC>T) in each 1000 Genomes individual, by population. <b>B</b>: Proportions of <i>f</i><sub><i>2</i></sub> and <i>f</i><sub><i>3</i></sub> variants in signature 2 (here defined as NCG>T, for any N) in each 1000 Genomes individual, by population (five outlying samples excluded).</p

    The estimated age distribution of haplotypes.

    No full text
    <p><b>A</b>: The distribution of the MLE of the ages of haplotypes shared within each population. <b>B–F</b>: The distribution of the MLE of the ages of haplotypes shared between one population and all other populations, shown for each of GBR, JPT, LWK, ASW, and PUR. Populations are described in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004528#pgen-1004528-t001" target="_blank">Table 1</a>. Density estimates are computed in space, using the base R <i>density</i> function with a Gaussian kernel.</p

    Estimating age from simulated data.

    No full text
    <p>We simulated whole genomes for 100 individuals (200 chromosomes), with , and HapMap 2 recombination rates. <b>A</b>: Estimated age against true age. The grey dots are the MLEs for each detected haplotype. The blue line is a quantile-quantile (qq) plot for the MLEs (from the 1<i><sup>st</sup></i> to 99<i><sup>th</sup></i> percentile). <b>B–D</b> Power to detect haplotypes as a function of <b>B</b>: genetic length, <b>C</b>: physical length and <b>D</b>: haplotype age; in each case the darker line represents the power to detect haplotype with 100% power to detect variants, and the lighter line the power with 66% power.</p
    corecore