24 research outputs found
Performance of the model for estimating branch-specific sex ratios.
<p>All histories represented from A to D share the same topology ((1,2),3) but differ with respect to the simulated ESR. The root population was made of 50,000 males and 50,000 females, and each branch in the topology corresponds to a population made of 500 males and 500 females (A). In (B) branch 2 was made of 250 females and 750 males (<i>Ο</i><sub>2</sub> = 0.25); in (C) branch 4 was made of 250 females and 750 males (<i>Ο</i><sub>4</sub> = 0.25); in (D) branch 3 was made of 250 females and 750 males (<i>Ο</i><sub>3</sub> = 0.25). Inset trees indicate which branch was simulated with a biased sex ratio. The two successive splits occurred 200 and 400 generations before present time. The mutation rate was fixed at <i>ÎŒ</i> = 5 Ă 10<sup>â7</sup>. 50 females per population were sampled for each dataset. We analyzed 50 replicate simulated datasets for each scenario, with 5,000 autosomal SNPs and 5,000 X-linked SNPs. The boxplots summarize the distributions of the 50 posterior means of <i>Ο</i><sub><i>i</i></sub> for each of the four branches. The horizontal dashed segments indicate the true (simulated) values of <i>Ο</i><sub><i>i</i></sub>. The pie-charts indicate the fraction of significant support values (<i>S</i> < 0.01), against the hypothesis <i>Ο</i> = 0.5 (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.e028" target="_blank">Eq 4</a>).</p
Directed acyclic graph (DAG) of the hierarchical Bayesian model for a three-population example tree.
<p>The square nodes characterize the data, i.e. represents the observed allele counts from autosomal and X-linked data in population <i>i</i> at SNP <i>j</i>. The circles and rounded rectangles represent the parameters to be estimated: is the (unknown) allele frequency in population <i>i</i>; is the length (in a diffusion time scale) of the branch leading to population <i>i</i>; <i>α</i><sup>(Ω)</sup> and <i>ÎČ</i><sup>(Ω)</sup> are the shape and scale parameters of the beta distribution, which describes the allele frequency distribution in the root population. Unidirectional edges (arrows) represent direct stochastic relationships within the model. They indicate the conditional dependency between connected nodes.</p
Application example on human (HapMap) data.
<p>We re-analyzed the dataset from Keinan et al. [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.ref019" target="_blank">19</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.ref042" target="_blank">42</a>], with genotypes from European American individuals from Utah, USA (CEU), Asian individuals grouping Han Chinese from Beijing and Japanese from Tokyo (ASN) and Yoruba individuals from Ibadan, Nigeria (YRI) (see the <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#sec015" target="_blank">Materials and methods</a> section). The data consisted of 340,909 autosomal SNPs and 12,737 X-linked SNPs. For both genetic systems, we randomly subsampled 50 pseudo-replicated datasets from the full data, each made of 5,000 autosomal SNPs and 5,000 X-linked SNPs. We ran KimTree conditionally on the ((CEU,ASN),YRI) topology, represented in (A) with branch lengths estimates corresponding to the posterior means of . (B) The boxplots summarize the distributions of the posterior means of the ESR for each branch in the tree, for the 50 pseudo-replicated datasets. The dotted line indicates the expectation for a balanced ESR (<i>Ο</i><sub><i>i</i></sub> = 0.5). The pie-charts indicate the fraction of significant support values (<i>S</i> < 0.01) against the hypothesis <i>Ο</i> = 0.5 (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.e028" target="_blank">Eq 4</a>).</p
Performance of the model for estimating branch-specific sex ratios in a four-population tree.
<p>We simulated a four-population tree with topology ((1,2),(3,4)). The root population was made of 50,000 males and 50,000 females, and the internal branches correspond to populations made of 5,000 males and 5,000 females. As depicted in (A), branch 1 was made of = 1,000 females and males (<i>Ο</i><sub>1</sub> = 0.1); branch 2 was made of females and males (<i>Ο</i><sub>2</sub> = 0.2); branch 3 was made of females and males (<i>Ο</i><sub>3</sub> = 0.9); branch 4 was made of females and males (<i>Ο</i><sub>4</sub> = 0.8). The two successive splits occurred 1,000 and 3,000 generations before present time. The mutation rate was fixed at <i>ÎŒ</i> = 1.5 Ă 10<sup>â7</sup>. 50 females per population were sampled for each dataset. We analyzed 50 replicate simulated datasets of each scenario, with 5,000 autosomal SNPs and 5,000 X-linked SNPs. The boxplots in (B) summarize the distributions of the 50 posterior means of <i>Ο</i><sub><i>i</i></sub> for each of the six branches. The horizontal dashed segments indicate the true (simulated) values of <i>Ο</i><sub><i>i</i></sub>. The pie-charts indicate the fraction of significant support values (<i>S</i> < 0.01), against the hypothesis <i>Ο</i> = 0.5 (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.e028" target="_blank">Eq 4</a>).</p
Application example on whole-genome human sequence data.
<p>We re-analyzed a subset of the whole-genome sequence data from Pagani et al. [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.ref033" target="_blank">33</a>], with populations from NW-Europe (NWE), SE-Asia (SEA), Oceania (OCE) and Americas (AME) (see the <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#sec015" target="_blank">Materials and methods</a> section for a detailed composition of populations). For both genetic systems, we randomly subsampled 50 pseudo-replicated datasets from the full data, each made of 5,000 autosomal SNPs and 5,000 X-linked SNPs. We ran KimTree considering the best fitting tree topology (NWE,SEA,OCE,AME) (see the <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#sec015" target="_blank">Materials and methods</a> section), represented in (A) with branch lengths estimates corresponding to the posterior means of . (B) The boxplots summarize the distributions of the posterior means of the ESR for each branch in the tree, for the 50 pseudo-replicated datasets. The dotted line indicates the expectation for a balanced ESR (<i>Ο</i><sub><i>i</i></sub> = 0.5). The pie-charts indicate the fraction of significant support values (<i>S</i> < 0.01) against the hypothesis <i>Ο</i> = 0.5 (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.e028" target="_blank">Eq 4</a>).</p
Application example on cattle data.
<p>We analyzed 643,090 autosomal SNPs and 15,009 X-linked SNPs from a dairy cattle breed (HOL), the Angus beef cattle breed (ANG), the NâDama breed (NDA). For both genetic systems, we randomly subsampled 50 pseudo-replicated datasets from the full data, each made of 5,000 autosomal SNPs and 5,000 X-linked SNPs. We ran KimTree considering the tree topology: ((HOL,ANG),NDA) [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.ref041" target="_blank">41</a>], represented in (A) with branch lengths estimates corresponding to the posterior means of . (B) The boxplots summarize the distributions of the posterior means of the ESR for each branch in the tree, for the 50 pseudo-replicated datasets. The dotted line indicates the expectation for a balanced ESR (<i>Ο</i><sub><i>i</i></sub> = 0.5). The pie-charts indicate the fraction of significant support values (<i>S</i> < 0.01) against the hypothesis <i>Ο</i> = 0.5 (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.e028" target="_blank">Eq 4</a>).</p
Robustness to violation of the model assumptions.
<p>We simulated four scenarios (A-D) based on a four-population tree with topology ((1,2),(3,4)), as depicted in the inset tree (top). In all scenarios, the root population was made of 50,000 males and 50,000 females, and the internal branches correspond to populations made of 5,000 males and 5,000 females. The two successive splits occurred 2,000 and 4,000 generations before present time. The mutation rate was fixed at <i>ÎŒ</i> = 1.5 Ă 10<sup>â7</sup>. 50 females per population were sampled for each dataset. In (A) the four external branches were made of females and males, and so a balanced ESR (<i>Ο</i><sub><i>i</i></sub> = 0.5) was assumed throughout the tree (âcontrolâ scenario). In (B), we simulated an instantaneous 5-fold population growth in branch 1 and an instantaneous 5-fold bottleneck in branch 4, both events having occurred 400 generations before present. In (C), we simulated migration between population 1 and 2, with equal rates for both sexes: <i>m</i><sub>f</sub> = <i>m</i><sub>m</sub> = 0.00025 (therefore ). In (D), we simulated female-biased migration between populations 1 and 2 with <i>m</i><sub>f</sub> = 0.00025 and <i>m</i><sub>m</sub> = 0 (therefore and ). We analyzed 50 replicate simulated datasets for each scenario, with 5,000 autosomal SNPs and 5,000 X-linked SNPs. The boxplots in (A-D) summarize the distributions of the 50 posterior means of <i>Ο</i><sub><i>i</i></sub> for each of the six branches. The horizontal dashed line indicates the true (simulated) values of <i>Ο</i><sub><i>i</i></sub>. The pie-charts indicate the fraction of significant support values (<i>S</i> < 0.01), against the hypothesis <i>Ο</i> = 0.5 (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007191#pgen.1007191.e028" target="_blank">Eq 4</a>).</p
Supplemental Material for Hivert et al., 2018
File S1 contains the detailed mathematical derivations of the model; Table S1 provides a comparison of pairwise <i>F</i><sub>ST</sub> estimates; Table S2 shows the effect of unequal sampling on pairwise <i>F</i><sub>ST</sub> estimates; Table S3 shows the effect of variable coverage on pairwise <i>F</i><sub>ST</sub> estimates; Figure S1 shows pairwise estimators of <i>F</i><sub>ST</sub>; Figure S2 shows the precision and accuracy of our estimator of <i>F</i><sub>ST</sub> as a function of pool size and coverage, with varying experimental error rate; Figure S3 shows the precision and accuracy of naive estimators of <i>F</i><sub>ST</sub> for Pool-seq data; Figure S4 shows the precision and accuracy of alternative estimators of <i>F</i><sub>ST</sub> with varying pool size, for various levels of differentiation.<br
Estimates of locus-specific effects <i>α<sub>i</sub>,</i> from BayeScan analyses, for each outlier locus in all the inter-host comparisons where it was detected as an outlier (in China and in France).
<p>The average of <i>α<sub>i</sub></i> over all these pairwise comparisons is also provided. These values are a proxy for the nature and strength of selection: positive <i>α<sub>i</sub></i> values suggest divergent selection while negative values suggest balancing selection. Population codes are defined in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069211#pone-0069211-t001" target="_blank">Table? 1</a>. Marker names are underlined (not underlined) for outliers detected at a 5% FDR (10%) threshold.</p
Distributions of <i>F</i><sub>ST</sub> estimates between populations sampled on different host plants, across all AFLP markers in France (A, including all pairs of ECB and ABB populations) and in China (B, including all pairs of ACB and ABB populations).
<p>Mean values of the distribution are 0.042 and 0.063, respectively, as indicated by the vertical dashed lines. Both distributions are highly leptokurtic (i.e. with kurtosis>3) and significantly different from one another (Kolmogorov-Smirnov test, <i>D</i>â=â0.089, <i>P</i><10<sup>â5</sup>). Higher kurtosis is observed for the ECB/ABB <i>F</i><sub>ST</sub> distribution (kurtosisâ=â12.16) than for the ACB/ABB <i>F</i><sub>ST</sub> distribution (kurtosisâ=â11.46).</p