21 research outputs found
Illustration of the painting process to create the coancestry matrix.
<p>We show the process by which a haplotype (haplotype 1, black) is painted using the others. A) True underlying genealogies for eight simulated sequences at three locations along a genomic segment, produced using the program âmsâ <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002453#pgen.1002453-Hudson1" target="_blank">[52]</a> and showing coalescence times between haplotypes at each position. B) The Time to the Most Recent Common Ancestor (TMRCA) between haplotype 1 and each other haplotype, as a function of sequence position. Note multiple haplotypes can share the same TMRCA and changes in TMRCA correspond to historical recombination sites. C) True distribution of the ânearest neighbourâ haplotype. D) Sample âpaintingsâ of the Li & Stephens algorithm. E) Expectation of the painting process, estimating the nearest neighbour distribution. F) Resulting row of the coancestry matrix, based on the expectation of the painting.</p
Half-matching using correlations for HGDP data.
<p>For each continent, we show the proportion of times in which two sets of chromosomes of a particular individual are matched correctly based on similarity of their coancestry profile. Coancestry profiles are calculated using a training set as described in the text. Results for coancestry matrices are calculated using correlation between individuals based on the linked and unlinked models. Also shown are the expected success in clustering if individuals within the same label or same inferred (linked results) fineSTRUCTURE population each had the same ancestry profile.</p
World HGDP results summary.
<p>A) Relationship between populations for the whole world data. Each tip corresponds to a population; labels include the number of individuals and are coloured red if all individuals within that label are found in a single clade. See text for an interpretation of the values on the edges; the cut defines the âsub-continentsâ discussed in the text. B) Transposed coancestry matrix for the Hazara and Burusho (in full: <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002453#pgen.1002453.s014" target="_blank">Figure S14</a>), showing CentralSouthAsia and EastAsia donors, which are each normalised to have mean donation rate of 1. The box shows the âdiagonalâ drift component.</p
PCA for East Asia HGDP data.
<p>The first 2 PCA components of the East Asian âcontinentâ as defined in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002453#pgen.1002453.s041" target="_blank">Table S1</a> are shown for A) the linked model and B) the unlinked model. Only the named labels are displayed for clarity; <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002453#pgen.1002453.s037" target="_blank">Figure S37</a> shows the full set. Further structure will be present in other principal components (not shown).</p
Simulated data scenario and painting results.
<p>A) Effective population size and B) population splits used for creating the simulated data. C) Coancestry heatmaps for linked and unlinked model with regions and 20 individuals per population, showing for (bottom left) the unlinked model, and (top right) the linked model; note that the linked heatmap is slightly asymmetric. D) PCA applied to the dataset using Eigenstrat on the raw SNP data. E) PCA on the coancestry matrix assuming markers are unlinked and F) linked (see text for details).</p
Coancestry heat map for the Europe sub-continent.
<p>A) (bottom left) population averages, (top right) the raw data matrix, and (left) chunks from other sub-continents. To symmetrise the matrices we show the average of the donor/recipient chunk counts; read the row <i>and</i> column for an individual to see their full profile. The tree has the same interpretation as <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002453#pgen-1002453-g004" target="_blank">Figure 4</a>, and the heatmap between individuals in Europe has the same interpretation as <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002453#pgen-1002453-g002" target="_blank">Figure 2C</a>, with extremely high (black) and low (white) values capped. Each continent has its own scale (top), with the lowest value in yellow and the highest in blue. B) ADMIXTURE barplot for the same dataset.</p
Human recombination hot spots hidden within regions of strong marker association
The fine-scale distribution of meiotic recombination events in the human genome
can be inferred from patterns of haplotype diversity in human populations but only directly studied by high-resolution sperm typing. Both approaches indicate that crossovers are heavily clustered into narrow recombination hot spots. However, our direct understanding of hot-spot properties and distributions is largely limited to sperm typing in the major histocompatibility complex (MHC). We now describe the analysis of an unremarkable 206 kb region on human
chromosome 1, revealing localised regions of linkage disequilibrium (LD) breakdown that mark the locations of sperm crossover hot spots. The distribution, intensity and morphology of these hot spots are strikingly similar to those in the MHC. However, we also accidentally detected additional hot spots within regions of strong association. Coalescent analysis of genotype data detected most of the hot spots, but revealed significant differences between sperm crossover frequencies and âhistoricalâ recombination rates. This raises the possibility that some hot spots, in particular those in regions of strong association, may have evolved very recently and not left their full imprint on haplotype diversity. These results suggest that hot spots could be very abundant and possibly fluid features of the human genome
Marginal Significance (âlog<sub>10</sub> p-value as Determined by <i>t-</i>Test) of the Wavelet Coefficients from Four Annotations as Predictors of the Coefficients of the Decomposition of Human-Chimpanzee Divergence
<div><p>Red boxes highlight significant positive linear relationships, and blue boxes, negative. The intensity of the colour is proportional to the degree of significance.</p><p>(A) Smoothed coefficients.</p><p>(B) Detail coefficients.</p></div
Quantile-Quantile Plots Showing the Difference in Allele Frequency Spectrum for ATâGC Mutations and GCâAT Mutations in Regions of Low and High Recombination
<p>If the two types of mutation were to have the same allele frequency distribution, we would expect to see a straight line. In both cases, ATâGC mutations are typically at higher frequencies than GCâAT mutations; however, the effect is more pronounced in regions of high recombination [(A), low recombination; (B), high recombination]. A quantification of the difference can be found in the text and supporting material.</p
Power Spectra and Pairwise Correlations of Detail Wavelet Coefficients
<p>Diagonal plots show the power spectrum of the wavelet decomposition of each factor on the long (red) and short (blue) arms of Chromosome 20. Off-diagonal plots show the rank correlation coefficient between pairs of detail wavelet coefficients at each scale on the long (top right) and short (bottom left) arms. Red crosses indicate significant correlations (<i>p</i>-value < 0.01; Kendall's rank correlation). Scale is shown in kilobases.</p