39,208 research outputs found

    Differential expression analysis with global network adjustment

    Get PDF
    <p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p> <p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p> <p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

    Combined population dynamics and entropy modelling supports patient stratification in chronic myeloid leukemia

    Get PDF
    Modelling the parameters of multistep carcinogenesis is key for a better understanding of cancer progression, biomarker identification and the design of individualized therapies. Using chronic myeloid leukemia (CML) as a paradigm for hierarchical disease evolution we show that combined population dynamic modelling and CML patient biopsy genomic analysis enables patient stratification at unprecedented resolution. Linking CD34+ similarity as a disease progression marker to patientderived gene expression entropy separated established CML progression stages and uncovered additional heterogeneity within disease stages. Importantly, our patient data informed model enables quantitative approximation of individual patients’ disease history within chronic phase (CP) and significantly separates “early” from “late” CP. Our findings provide a novel rationale for personalized and genome-informed disease progression risk assessment that is independent and complementary to conventional measures of CML disease burden and prognosis

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    Circular RNAs in Clear Cell Renal Cell Carcinoma: Their Microarray-Based Identification, Analytical Validation, and Potential Use in a Clinico-Genomic Model to Improve Prognostic Accuracy

    Get PDF
    Circular RNAs (circRNAs) may act as novel cancer biomarkers. However, a genome-wide evaluation of circRNAs in clear cell renal cell carcinoma (ccRCC) has yet to be conducted. Therefore, the objective of this study was to identify and validate circRNAs in ccRCC tissue with a focus to evaluate their potential as prognostic biomarkers. A genome-wide identification of circRNAs in total RNA extracted from ccRCC tissue samples was performed using microarray analysis. Three relevant differentially expressed circRNAs were selected (circEGLN3, circNOX4, and circRHOBTB3), their circular nature was experimentally confirmed, and their expression-along with that of their linear counterparts-was measured in 99 malignant and 85 adjacent normal tissue samples using specifically established RT-qPCR assays. The capacity of circRNAs to discriminate between malignant and adjacent normal tissue samples and their prognostic potential (with the endpoints cancer-specific, recurrence-free, and overall survival) after surgery were estimated by C-statistics, Kaplan-Meier method, univariate and multivariate Cox regression analysis, decision curve analysis, and Akaike and Bayesian information criteria. CircEGLN3 discriminated malignant from normal tissue with 97% accuracy. We generated a prognostic for the three endpoints by multivariate Cox regression analysis that included circEGLN3, circRHOBT3 and linRHOBTB3. The predictive outcome accuracy of the clinical models based on clinicopathological factors was improved in combination with this circRNA-based signature. Bootstrapping as well as Akaike and Bayesian information criteria confirmed the statistical significance and robustness of the combined models. Limitations of this study include its retrospective nature and the lack of external validation. The study demonstrated the promising potential of circRNAs as diagnostic and particularly prognostic biomarkers in ccRCC patients

    Defining a robust biological prior from Pathway Analysis to drive Network Inference

    Get PDF
    Inferring genetic networks from gene expression data is one of the most challenging work in the post-genomic era, partly due to the vast space of possible networks and the relatively small amount of data available. In this field, Gaussian Graphical Model (GGM) provides a convenient framework for the discovery of biological networks. In this paper, we propose an original approach for inferring gene regulation networks using a robust biological prior on their structure in order to limit the set of candidate networks. Pathways, that represent biological knowledge on the regulatory networks, will be used as an informative prior knowledge to drive Network Inference. This approach is based on the selection of a relevant set of genes, called the "molecular signature", associated with a condition of interest (for instance, the genes involved in disease development). In this context, differential expression analysis is a well established strategy. However outcome signatures are often not consistent and show little overlap between studies. Thus, we will dedicate the first part of our work to the improvement of the standard process of biomarker identification to guarantee the robustness and reproducibility of the molecular signature. Our approach enables to compare the networks inferred between two conditions of interest (for instance case and control networks) and help along the biological interpretation of results. Thus it allows to identify differential regulations that occur in these conditions. We illustrate the proposed approach by applying our method to a study of breast cancer's response to treatment

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

    Integration of microRNA changes in vivo identifies novel molecular features of muscle insulin resistance in type 2 diabetes

    Get PDF
    Skeletal muscle insulin resistance (IR) is considered a critical component of type II diabetes, yet to date IR has evaded characterization at the global gene expression level in humans. MicroRNAs (miRNAs) are considered fine-scale rheostats of protein-coding gene product abundance. The relative importance and mode of action of miRNAs in human complex diseases remains to be fully elucidated. We produce a global map of coding and non-coding RNAs in human muscle IR with the aim of identifying novel disease biomarkers. We profiled >47,000 mRNA sequences and >500 human miRNAs using gene-chips and 118 subjects (n = 71 patients versus n = 47 controls). A tissue-specific gene-ranking system was developed to stratify thousands of miRNA target-genes, removing false positives, yielding a weighted inhibitor score, which integrated the net impact of both up- and down-regulated miRNAs. Both informatic and protein detection validation was used to verify the predictions of in vivo changes. The muscle mRNA transcriptome is invariant with respect to insulin or glucose homeostasis. In contrast, a third of miRNAs detected in muscle were altered in disease (n = 62), many changing prior to the onset of clinical diabetes. The novel ranking metric identified six canonical pathways with proven links to metabolic disease while the control data demonstrated no enrichment. The Benjamini-Hochberg adjusted Gene Ontology profile of the highest ranked targets was metabolic (P < 7.4 × 10-8), post-translational modification (P < 9.7 × 10-5) and developmental (P < 1.3 × 10-6) processes. Protein profiling of six development-related genes validated the predictions. Brain-derived neurotrophic factor protein was detectable only in muscle satellite cells and was increased in diabetes patients compared with controls, consistent with the observation that global miRNA changes were opposite from those found during myogenic differentiation. We provide evidence that IR in humans may be related to coordinated changes in multiple microRNAs, which act to target relevant signaling pathways. It would appear that miRNAs can produce marked changes in target protein abundance in vivo by working in a combinatorial manner. Thus, miRNA detection represents a new molecular biomarker strategy for insulin resistance, where micrograms of patient material is needed to monitor efficacy during drug or life-style interventions

    Application of whole genome and RNA sequencing to investigate the genomic landscape of common variable immunodeficiency disorders.

    Get PDF
    Common Variable Immunodeficiency Disorders (CVIDs) are the most prevalent cause of primary antibody failure. CVIDs are highly variable and a genetic causes have been identified in <5% of patients. Here, we performed whole genome sequencing (WGS) of 34 CVID patients (94% sporadic) and combined them with transcriptomic profiling (RNA-sequencing of B cells) from three patients and three healthy controls. We identified variants in CVID disease genes TNFRSF13B, TNFRSF13C, LRBA and NLRP12 and enrichment of variants in known and novel disease pathways. The pathways identified include B-cell receptor signalling, non-homologous end-joining, regulation of apoptosis, T cell regulation and ICOS signalling. Our data confirm the polygenic nature of CVID and suggest individual-specific aetiologies in many cases. Together our data show that WGS in combination with RNA-sequencing allows for a better understanding of CVIDs and the identification of novel disease associated pathways
    corecore