Search CORE

55 research outputs found

Simulation results – average VUS and FCAUC values as functions of the distance between the mean of the true negative and true positive distributions and the class ratio (#negative/#positive).

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

The two distributions are both normal with standard deviation one. Black curves: values corresponding to integer distances between means.</p

FigShare

A new dynamic correlation algorithm reveals novel functional aspects in single cell and bulk RNA-seq data

Author: Tianwei Yu (17518)
Publication venue
Publication date: 01/08/2018
Field of study

<div>Dynamic correlations are pervasive in high-throughput data. Large numbers of gene pairs can change their correlation patterns in response to observed/unobserved changes in physiological states. Finding changes in correlation patterns can reveal important regulatory mechanisms. Currently there is no method that can effectively detect global dynamic correlation patterns in a dataset. Given the challenging nature of the problem, the currently available methods use genes as surrogate measurements of physiological states, which cannot faithfully represent true underlying biological signals. In this study we develop a new method that directly identifies strong latent dynamic correlation signals from the data matrix, named DCA: Dynamic Correlation Analysis. At the center of the method is a new metric for the identification of pairs of variables that are highly likely to be dynamically correlated, without knowing the underlying physiological states that govern the dynamic correlation. We validate the performance of the method with extensive simulations. We applied the method to three real datasets: a single cell RNA-seq dataset, a bulk RNA-seq dataset, and a microarray gene expression dataset. In all three datasets, the method reveals novel latent factors with clear biological meaning, bringing new insights into the data.</div

Directory of Open Access Journals

FigShare

Some example Dynamic Components from the cell cycle data.

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

Colors: the four cell cycle experiments. Red: alpha factor; green: CDC15; blue: CDC28; purple: elutriation.</p

FigShare

Illustration of the construction of the ROC curve and the ROC surface (ROCS).

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

Illustration of the construction of the ROC curve and the ROC surface (ROCS).</p

FigShare

Results from the TCGA BRCA dataset.

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

(a) Scatter plots of DC1, DC3, and DC7 scores. The points are colored based on the ER status of the subjects. DC1 separates ER+ and ER-, while DC3 and DC7 have a wide spread only for the ER- subjects. (b) DC1 captures similar information as the second principal component. (c) Kaplan–Meier curves of the ER-negative subjects, red: absolute factor score > 0.05.</p

FigShare

Biological process pairs with excessive dynamic correlations related to DCs 2 and 5.

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

Gene pairs were selected using fdr threshold of 0.01. Biological process pairs were selected using a p-value threshold of 0.001 and fold-change of 2. For simplicity, only nodes with connections above a certain threshold are shown. Node sizes reflect the total number of connections of each node. (a) Biological process pairs associated with the DC2. (b) Biological process pairs associated with the DC5. (c) Example plots of gene pairs with LA relation with DC5. Red points: samples in the lower 33% of DC5 score; blue points: samples in the upper 33% of DC5 score.</p

FigShare

The liquid association coefficient (LAC).

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

(a) Illustration of LAC using examples. Left column: dynamic correlation with an unknown conditioning factor. When the factor is low, x and y are negatively correlated; when the factor is high, x and y are positively correlated. Second left column: independent case. Right two columns: correlated case. In all the cases, the marginal distribution of X and Y are standard normal. (b) Empirical distributions of LAC score under conditions of dynamic correlation, simple correlation, or independence. The densities are based on 1000 simulations. In the dynamic correlation cases, one-third of the data points follow a bivariate normal distribution with mean and variance-covariance matrix , one-third follow a bivariate normal distribution with mean and variance-covariance matrix , and another one-third follow independent standard normal distributions. In the correlated case, all data points follow a bivariate normal distribution with mean and variance-covariance matrix .</p

FigShare

Major biological processes associated with the DCs.

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

(a) DC1, (b) DC2, (c) DC3, and (d) DC5. Gene pairs were selected using fdr threshold of 0.01. Biological process pairs were selected using a p-value threshold of 0.001 and fold-change of 4. All were limited to biological processes with 50 or more connections, except for DC2, for which the limit was 100 due to the existence of excessive connections.</p

FigShare

Testing the difference in VUS between different methods.

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

(a) Average p-values over the nine comparisons. (b) Fractions of the nine comparisons being significant (p-value <0.05). Please note that the nine comparisons in Turro et al <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0040598#pone.0040598-Turro1" target="_blank">[10]</a> are not independent.</p

FigShare

Biological process pairs with excessive dynamic correlations related to DCs 3 and 7.

Author: Tianwei Yu (17518)
Publication venue
Publication date
Field of study

Gene pairs were selected using fdr threshold of 0.01. Biological process pairs were selected using a p-value threshold of 0.001 and fold-change of 3. For simplicity, only nodes with connections above a certain threshold are shown. Node sizes reflect the total number of connections of each node. (a) Biological process pairs associated with the 3rd DC. (b) Biological process pairs associated with the 7th DC. Inset: scatterplot of LUMP (leukocytes unmethylation for purity) vs DC7 score. The correlation coefficient is -0.35.</p

FigShare