58 research outputs found

    Misclassification error, false discovery and false non-discovery rates for case A2 (results are averaged over 50 replicates)

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Statistical tools for synthesizing lists of differentially expressed features in related experiments"</p><p>http://genomebiology.com/2007/8/4/R54</p><p>Genome Biology 2007;8(4):R54-R54.</p><p>Published online 11 Apr 2007</p><p>PMCID:PMC1896017.</p><p></p> The upper plot shows the false discovery rate (FDR) and the false non-discovery rate (FNR) for case A2. The FDR is calculated as the ratio of the false positives to the number of genes called in common, while the FDR is calculated as the ratio of the false negatives to the number of genes not called in common. The true differences are drawn from a (2, 0.5) and the noise component experiment specific is 2 for the first experiment and 3 for the second. R(q) shows the minimum FDR. On the other hand, R(q) has a very large FDR and the improvement of the FNR is slight. As a compromise, the threshold qis close to q, so guarantees a low FDR, but returns a larger list. It approximatively corresponds to the intersection point between the two curves of FDR and FNR. The lower plot shows the global error as the sum of FP and FN. The threshold associated with R(q) is very close to the minimum of the curve, that is, to the smallest global misclassification error

    Log fold change (natural log) for the VILI experiment (left) and high-fat diet experiment (right)

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Statistical tools for synthesizing lists of differentially expressed features in related experiments"</p><p>http://genomebiology.com/2007/8/4/R54</p><p>Genome Biology 2007;8(4):R54-R54.</p><p>Published online 11 Apr 2007</p><p>PMCID:PMC1896017.</p><p></p> The left plot shows the log fold changes for mice versus rat averaged over the two replicates for each species. The right plot shows the log fold changes for fat versus muscle averaged over the three and four replicates for each species. The circles correspond to the genes highlighted by our analysis and by the method of Hwang .; they are characterized by a large log fold change for both the species. The correlation of the two fold changes for this group is 0.4 (VILI experiment) and 0.8 (high-fat diet experiment). The crosses correspond to the genes highlighted only by Hwang .'s analysis; they are characterized by a large log fold change for one species and a small fold change for the other one. The correlation of the two fold changes for this group is 0.06 (VILI experiment) and 0.36 (high-fat diet experiment)

    Biological cell-to-cell heterogeneity.

    No full text
    <p>(a): Histogram of the posterior medians of gene-specific biological cell-to-cell heterogeneity variance contributions <i>σ</i><sub><i>i</i></sub> (defined in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.e008" target="_blank">Eq (6)</a>) across the 7,895 biological genes. (b): For each of the 7,895 biological genes, posterior medians of biological cell-to-cell heterogeneity term <i>δ</i><sub><i>i</i></sub> (log scale) against posterior medians of expression level <i>μ</i><sub><i>i</i></sub> (log scale). Red lines represent the contours in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.e009" target="_blank">Eq (7)</a>, related to HVG (log scale) at different levels of the variance contribution threshold <i>γ</i><sub><i>H</i></sub>. Blue lines represent the equivalent contours linked to LVG at different levels of the variance contribution threshold <i>γ</i><sub><i>L</i></sub>. These contours were estimated based on posterior medians of <i>ϕ</i><sub><i>j</i></sub>’s, <i>s</i><sub><i>j</i></sub>’s and <i>θ</i>.</p

    Graphical representation of the hierarchical model implemented in BASiCS.

    No full text
    <p>Diagram based on the expression counts of 2 genes (<i>i</i>: biological and <i>i</i>′: technical) at 2 cells (<i>j</i> and <i>j</i>′). Squared and circular nodes denote known observed quantities (observed expression counts and added number of spike-in mRNA molecules) and unknown elements, respectively. Whereas black circular nodes represent the random effects that play an intermediate role in our hierarchical structure, red circular nodes relate to unknown model parameters in the top layer of hierarchy in our model. Blue, green and grey areas highlight elements that are shared within a biological gene, technical gene or cell, respectively. BASiCS treats cell-specific normalising constants (<i>ϕ</i><sub><i>j</i></sub>’s and <i>s</i><sub><i>j</i></sub>’s) as model parameters, and estimates them by combining information across all genes. Unexplained technical noise is quantified via a single hyper-parameter <i>θ</i>, borrowing information across all genes and cells. Finally, BASiCS quantifies biological cell-to-cell variability via gene-specific hyper-parameters <i>δ</i><sub><i>i</i></sub>, borrowing information across all cells.</p

    Normalisation.

    No full text
    <p>(a) and (b): for each of the 41 mouse ESCs, vertical lines represent the 95% high posterior density interval (blue dot located at the posterior median) of cell-specific normalising constants <i>ϕ</i><sub><i>j</i></sub> (cellular mRNA content) and <i>s</i><sub><i>j</i></sub> (interpreted in terms of capture and reverse transcription efficiency for UMI counts), respectively. While BASiCS suggests substantial heterogeneity in the total amount of molecules per cell (<i>ϕ</i><sub><i>j</i></sub>), the scale of the technical counts remains stable among cells (<i>s</i><sub><i>j</i></sub>). This is expected when using UMI protocols, where counts should not be affected by sequencing depth and other amplification biases. Red dots are the values estimated by the stepwise method described in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref005" target="_blank">5</a>]. There is a good agreement of the methods in terms of cellular mRNA content (<i>ϕ</i><sub><i>j</i></sub>), but the estimations of <i>s</i><sub><i>j</i></sub> according to [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref005" target="_blank">5</a>] suggest stronger differences than what is expected when using UMI protocols. In (b), black dots represent the proportion of total spike-in molecules captured in each cell. Our estimations of the <i>s</i><sub><i>j</i></sub>’s are in better agreement with these empirical measurements (suggesting BASiCS infers a more adequate reverse transcription efficiency level). (c) and (d) histogram of a Markov Chain Monte Carlo sample from <i>s</i><sub>1</sub> and <i>s</i><sub>2</sub>, respectively. These posterior distributions are highly skewed and thence the posterior modes are a closer match to the empirical capture proportions than the corresponding posterior medians.</p

    HVG and LVG detection.

    No full text
    <p>(a) and (b): for each of the 7,895 biological genes, gene-specific expression rate <i>μ</i><sub><i>i</i></sub> (log scale) against the probability of being HVG (</p><p></p><p></p><p></p><p><mi>π</mi><mi>i</mi><mi>H</mi></p><mo stretchy="false">(</mo><p><mi>γ</mi><mi>H</mi></p><mo stretchy="false">)</mo><p></p><p></p><p></p>) and the probability of being LVG (<p></p><p></p><p></p><p><mi>π</mi><mi>i</mi><mi>L</mi></p><mo stretchy="false">(</mo><p><mi>γ</mi><mi>L</mi></p><mo stretchy="false">)</mo><p></p><p></p><p></p>), respectively. Setting the EFDR and the EFNR equal to 10%, the corresponding variance contribution thresholds are <i>γ</i><sub><i>H</i></sub> = 79% and <i>γ</i><sub><i>L</i></sub> = 41%. Black dashed lines located at optimal (i.e. when EFDR and EFNR coincide) evidence thresholds <i>α</i><sub><i>H</i></sub> = 0.7925 and <i>α</i><sub><i>L</i></sub> = 0.7650, respectively. The 133 and 589 genes classified as HVG and LVG are highlighted in red and blue, respectively.<p></p

    Comparison of HVG detection among different methods.

    No full text
    <p>For each of the 7,895 biological genes, posterior medians of biological cell-to-cell heterogeneity term <i>δ</i><sub><i>i</i></sub> (log scale) against posterior medians of expression level <i>μ</i><sub><i>i</i></sub> (log scale). While the methods described in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref016" target="_blank">16</a>] and [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref005" target="_blank">5</a>] only provide a characterisation of HVG, BASiCS is able to detect those genes whose expression rates are stable among cells.</p

    Simulated performance of <i>s</i><sub><i>j</i></sub>’s estimates (method described in [5] and BASiCS).

    No full text
    <p>Based on 400 simulated datasets from the model implemented in BASiCS with the same structure as in the mouse ESC dataset (simulated parameter values defined as posterior medians of the original model fit) and 6 different values for <i>θ</i>. (a) percentage of the simulated spike-in genes (out of 46) without zero counts (i.e. those that can be used when calculating the estimator proposed in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref005" target="_blank">5</a>]) for different simulated values of <i>θ</i>. (b) and (c) estimates of <i>s</i><sub>1</sub> (first cell) across all simulated datasets for different simulated values of <i>θ</i> using the method described in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref005" target="_blank">5</a>] and BASiCS (posterior medians), respectively. As the strength of unexplained technical noise increases (larger values of <i>θ</i>), estimates obtained using the approach described in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.ref005" target="_blank">5</a>] become highly unstable (we illustrated this using the first simulated cell, but the same conclusion can be obtained based on any other cell). This is due to a larger proportion of zeros among the simulated expression counts, i.e. less spike-in genes can be used when estimating <i>s</i><sub>1</sub>. In contrast, the stability of the BASiCS estimates is not substantially affected by the strength of unexplained technical noise.</p

    Cell-specific random effects linked to unexplained technical variability.

    No full text
    <p>(a): for each of the 41 mouse ES cells, vertical lines represent the 95% high posterior density interval (blue dot located at the posterior median) of the random effects related to unexplained technical cell-to-cell variability (<i>ν</i><sub><i>j</i></sub>). (b): histogram of a Markov Chain Monte Carlo sample from <i>θ</i>. Posterior inference strongly suggests the presence of unexplained technical noise in gene expression measurements. In fact, the posterior distribution of <i>θ</i> is concentrated away from zero and—even though the posterior distributions of the <i>s</i><sub><i>j</i></sub>’s are highly homogeneous across cells (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004333#pcbi.1004333.g004" target="_blank">Fig 4(b)</a>)—there is a strong heterogeneity among the posterior distributions of the <i>ν</i><sub><i>j</i></sub>’s (evidenced by non-overlapping 95% high posterior density intervals).</p
    • …
    corecore