39 research outputs found

    Inference for High-dimensional Differential Correlation Matrices

    Get PDF
    Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.Comment: Accepted for publication in Journal of Multivariate Analysi

    Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biclustering is an important analysis procedure to understand the biological mechanisms from microarray gene expression data. Several algorithms have been proposed to identify biclusters, but very little effort was made to compare the performance of different algorithms on real datasets and combine the resultant biclusters into one unified ranking.</p> <p>Results</p> <p>In this paper we propose differential co-expression framework and a differential co-expression scoring function to objectively quantify quality or goodness of a bicluster of genes based on the observation that genes in a bicluster are co-expressed in the conditions belonged to the bicluster and not co-expressed in the other conditions. Furthermore, we propose a scoring function to stratify biclusters into three types of co-expression. We used the proposed scoring functions to understand the performance and behavior of the four well established biclustering algorithms on six real datasets from different domains by combining their output into one unified ranking.</p> <p>Conclusions</p> <p>Differential co-expression framework is useful to provide quantitative and objective assessment of the goodness of biclusters of co-expressed genes and performance of biclustering algorithms in identifying co-expression biclusters. It also helps to combine the biclusters output by different algorithms into one unified ranking i.e. meta-biclustering.</p

    CODC: A Copula-based model to identify differential coexpression

    Get PDF
    Differential coexpression has recently emerged as a new way to establish a fundamental difference in expression pattern among a group of genes between two populations. Earlier methods used some scoring techniques to detect changes in correlation patterns of a gene pair in two conditions. However, modeling differential coexpression by means of finding differences in the dependence structure of the gene pair has hitherto not been carried out. We exploit a copula-based framework to model differential coexpression between gene pairs in two different conditions. The Copula is used to model the dependency between expression profiles of a gene pair. For a gene pair, the distance between two joint distributions produced by copula is served as differential coexpression. We used five pan-cancer TCGA RNA-Seq data to evaluate the model that outperforms the existing state of the art. Moreover, the proposed model can detect a mild change in the coexpression pattern across two conditions. For noisy expression data, the proposed method perf

    Identifying set-wise differential co-expression in gene expression microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Previous differential coexpression analyses focused on identification of differentially coexpressed gene pairs, revealing many insightful biological hypotheses. However, this method could not detect coexpression relationships between pairs of gene sets. Considering the success of many set-wise analysis methods for microarray data, a coexpression analysis based on gene sets may elucidate underlying biological processes provoked by the conditional changes. Here, we propose a differentially coexpressed gene sets (dCoxS) algorithm that identifies the differentially coexpressed gene set pairs between conditions.</p> <p>Results</p> <p>dCoxS is a two-step analysis method. In each condition, dCoxS measures the interaction score (IS), which represents the expression similarity between two gene sets using Renyi relative entropy. When estimating the relative entropy, multivariate kernel density estimation was used to model gene-gene correlation structure. Statistical tests for the conditional difference between the ISs determined the significance of differential coexpression of the gene set pair. Simulation studies supported that the IS is a representative measure of similarity between gene expression matrices. Single gene coexpression analysis of two publicly available microarray datasets detected no significant results. However, the dCoxS analysis of the datasets revealed differentially coexpressed gene set pairs related to the biological conditions of the datasets.</p> <p>Conclusion</p> <p>dCoxS identified differentially coexpressed gene set pairs not found by single gene analysis. The results indicate that set-wise differential coexpression analysis is useful for understanding biological processes induced by conditional changes.</p

    Large-Scale Multiple Testing of Correlations

    Get PDF
    Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity analysis. In this article, we consider large-scale simultaneous testing for correlations in both the one-sample and two-sample settings. New multiple testing procedures are proposed and a bootstrap method is introduced for estimating the proportion of the nulls falsely rejected among all the true nulls. We investigate the properties of the proposed procedures both theoretically and numerically. It is shown that the procedures asymptotically control the overall false discovery rate and false discovery proportion at the nominal level. Simulation results show that the methods perform well numerically in terms of both the size and power of the test and it significantly outperforms two alternative methods. The two-sample procedure is also illustrated by an analysis of a prostate cancer dataset for the detection of changes in coexpression patterns between gene expression levels. Supplementary materials for this article are available online

    Identifying differential correlation in gene/pathway combinations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important emerging trend in the analysis of microarray data is to incorporate known pathway information a priori. Expression level "summaries" for pathways, obtained from the expression data for the genes constituting the pathway, permit the inclusion of pathway information, reduce the high dimensionality of microarray data, and have the power to elucidate gene-interaction dependencies which are not already accounted for through known pathway identification.</p> <p>Results</p> <p>We present a novel method for the analysis of microarray data that identifies joint differential expression in gene-pathway pairs. This method takes advantage of known gene pathway memberships to compute a summary expression level for each pathway as a whole. Correlations between the pathway expression summary and the expression levels of genes not already known to be associated with the pathway provide clues to gene interaction dependencies that are not already accounted for through known pathway identification, and statistically significant differences between gene-pathway correlations in phenotypically different cells (e.g., where the expression level of a single gene and a given pathway summary correlate strongly in normal cells but weakly in tumor cells) may indicate biologically relevant gene-pathway interactions. Here, we detail the methodology and present the results of this method applied to two gene-expression datasets, identifying gene-pathway pairs which exhibit differential joint expression by phenotype.</p> <p>Conclusion</p> <p>The method described herein provides a means by which interactions between large numbers of genes may be identified by incorporating known pathway information to reduce the dimensionality of gene interactions. The method is efficient and easily applied to data sets of ~10<sup>2 </sup>arrays. Application of this method to two publicly-available cancer data sets yields suggestive and promising results. This method has the potential to complement gene-at-a-time analysis techniques for microarray analysis by indicating relationships between pathways and genes that have not previously been identified and which may play a role in disease.</p

    Differentially correlated genes in co-expression networks control phenotype transitions.

    Get PDF
    BackgroundCo-expression networks are a tool widely used for analysis of "Big Data" in biology that can range from transcriptomes to proteomes, metabolomes and more recently even microbiomes. Several methods were proposed to answer biological questions interrogating these networks. Differential co-expression analysis is a recent approach that measures how gene interactions change when a biological system transitions from one state to another. Although the importance of differentially co-expressed genes to identify dysregulated pathways has been noted, their role in gene regulation is not well studied. Herein we investigated differentially co-expressed genes in a relatively simple mono-causal process (B lymphocyte deficiency) and in a complex multi-causal system (cervical cancer).MethodsCo-expression networks of B cell deficiency (Control and BcKO) were reconstructed using Pearson correlation coefficient for two mus musculus datasets: B10.A strain (12 normal, 12 BcKO) and BALB/c strain (10 normal, 10 BcKO). Co-expression networks of cervical cancer (normal and cancer) were reconstructed using local partial correlation method for five datasets (total of 64 normal, 148 cancer). Differentially correlated pairs were identified along with the location of their genes in BcKO and in cancer networks. Minimum Shortest Path and Bi-partite Betweenness Centrality where statistically evaluated for differentially co-expressed genes in corresponding networks.    Results: We show that in B cell deficiency the differentially co-expressed genes are highly enriched with immunoglobulin genes (causal genes). In cancer we found that differentially co-expressed genes act as "bottlenecks" rather than causal drivers with most flows that come from the key driver genes to the peripheral genes passing through differentially co-expressed genes. Using in vitro knockdown experiments for two out of 14 differentially co-expressed genes found in cervical cancer (FGFR2 and CACYBP), we showed that they play regulatory roles in cancer cell growth.ConclusionIdentifying differentially co-expressed genes in co-expression networks is an important tool in detecting regulatory genes involved in alterations of phenotype

    A semi-parametric Bayesian model for unsupervised differential co-expression analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.</p> <p>Results</p> <p>We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ER<it>α </it>regulatory network.</p> <p>Conclusions</p> <p>We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.</p
    corecore