55 research outputs found

    Simulation results – average VUS and FCAUC values as functions of the distance between the mean of the true negative and true positive distributions and the class ratio (#negative/#positive).

    No full text
    <p>The two distributions are both normal with standard deviation one. Black curves: values corresponding to integer distances between means.</p

    A new dynamic correlation algorithm reveals novel functional aspects in single cell and bulk RNA-seq data

    No full text
    <div><p>Dynamic correlations are pervasive in high-throughput data. Large numbers of gene pairs can change their correlation patterns in response to observed/unobserved changes in physiological states. Finding changes in correlation patterns can reveal important regulatory mechanisms. Currently there is no method that can effectively detect global dynamic correlation patterns in a dataset. Given the challenging nature of the problem, the currently available methods use genes as surrogate measurements of physiological states, which cannot faithfully represent true underlying biological signals. In this study we develop a new method that directly identifies strong latent dynamic correlation signals from the data matrix, named DCA: Dynamic Correlation Analysis. At the center of the method is a new metric for the identification of pairs of variables that are highly likely to be dynamically correlated, without knowing the underlying physiological states that govern the dynamic correlation. We validate the performance of the method with extensive simulations. We applied the method to three real datasets: a single cell RNA-seq dataset, a bulk RNA-seq dataset, and a microarray gene expression dataset. In all three datasets, the method reveals novel latent factors with clear biological meaning, bringing new insights into the data.</p></div

    Some example Dynamic Components from the cell cycle data.

    No full text
    <p>Colors: the four cell cycle experiments. Red: alpha factor; green: CDC15; blue: CDC28; purple: elutriation.</p

    Illustration of the construction of the ROC curve and the ROC surface (ROCS).

    No full text
    <p>Illustration of the construction of the ROC curve and the ROC surface (ROCS).</p

    Results from the TCGA BRCA dataset.

    No full text
    <p>(a) Scatter plots of DC1, DC3, and DC7 scores. The points are colored based on the ER status of the subjects. DC1 separates ER+ and ER-, while DC3 and DC7 have a wide spread only for the ER- subjects. (b) DC1 captures similar information as the second principal component. (c) Kaplan–Meier curves of the ER-negative subjects, red: absolute factor score > 0.05.</p

    Biological process pairs with excessive dynamic correlations related to DCs 2 and 5.

    No full text
    <p>Gene pairs were selected using fdr threshold of 0.01. Biological process pairs were selected using a p-value threshold of 0.001 and fold-change of 2. For simplicity, only nodes with connections above a certain threshold are shown. Node sizes reflect the total number of connections of each node. (a) Biological process pairs associated with the DC2. (b) Biological process pairs associated with the DC5. (c) Example plots of gene pairs with LA relation with DC5. Red points: samples in the lower 33% of DC5 score; blue points: samples in the upper 33% of DC5 score.</p

    The liquid association coefficient (LAC).

    No full text
    <p><b>(a) Illustration of LAC using examples.</b> Left column: dynamic correlation with an unknown conditioning factor. When the factor is low, <i>x</i> and <i>y</i> are negatively correlated; when the factor is high, <i>x</i> and <i>y</i> are positively correlated. Second left column: independent case. Right two columns: correlated case. In all the cases, the marginal distribution of <i>X</i> and <i>Y</i> are standard normal. <b>(b) Empirical distributions of LAC score under conditions of dynamic correlation, simple correlation, or independence.</b> The densities are based on 1000 simulations. In the dynamic correlation cases, one-third of the data points follow a bivariate normal distribution with mean and variance-covariance matrix , one-third follow a bivariate normal distribution with mean and variance-covariance matrix , and another one-third follow independent standard normal distributions. In the correlated case, all data points follow a bivariate normal distribution with mean and variance-covariance matrix .</p

    Major biological processes associated with the DCs.

    No full text
    <p>(a) DC1, (b) DC2, (c) DC3, and (d) DC5. Gene pairs were selected using fdr threshold of 0.01. Biological process pairs were selected using a p-value threshold of 0.001 and fold-change of 4. All were limited to biological processes with 50 or more connections, except for DC2, for which the limit was 100 due to the existence of excessive connections.</p

    Testing the difference in VUS between different methods.

    No full text
    <p>(a) Average p-values over the nine comparisons. (b) Fractions of the nine comparisons being significant (p-value <0.05). Please note that the nine comparisons in Turro et al <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0040598#pone.0040598-Turro1" target="_blank">[10]</a> are not independent.</p

    Biological process pairs with excessive dynamic correlations related to DCs 3 and 7.

    No full text
    <p>Gene pairs were selected using fdr threshold of 0.01. Biological process pairs were selected using a p-value threshold of 0.001 and fold-change of 3. For simplicity, only nodes with connections above a certain threshold are shown. Node sizes reflect the total number of connections of each node. (a) Biological process pairs associated with the 3<sup>rd</sup> DC. (b) Biological process pairs associated with the 7<sup>th</sup> DC. Inset: scatterplot of LUMP (leukocytes unmethylation for purity) vs DC7 score. The correlation coefficient is -0.35.</p
    • …
    corecore