5,248 research outputs found

    A power law global error model for the identification of differentially expressed genes in microarray data

    Get PDF
    BACKGROUND: High-density oligonucleotide microarray technology enables the discovery of genes that are transcriptionally modulated in different biological samples due to physiology, disease or intervention. Methods for the identification of these so-called "differentially expressed genes" (DEG) would largely benefit from a deeper knowledge of the intrinsic measurement variability. Though it is clear that variance of repeated measures is highly dependent on the average expression level of a given gene, there is still a lack of consensus on how signal reproducibility is linked to signal intensity. The aim of this study was to empirically model the variance versus mean dependence in microarray data to improve the performance of existing methods for identifying DEG. RESULTS: In the present work we used data generated by our lab as well as publicly available data sets to show that dispersion of repeated measures depends on location of the measures themselves following a power law. This enables us to construct a power law global error model (PLGEM) that is applicable to various Affymetrix GeneChip data sets. A new DEG identification method is therefore proposed, consisting of a statistic designed to make explicit use of model-derived measurement spread estimates and a resampling-based hypothesis testing algorithm. CONCLUSIONS: The new method provides a control of the false positive rate, a good sensitivity vs. specificity trade-off and consistent results with varying number of replicates and even using single samples

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

    aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

    No full text
    Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices

    Integrative analyses of transcriptome sequencing identify novel functional lncRNAs in esophageal squamous cell carcinoma.

    Get PDF
    Long non-coding RNAs (lncRNAs) have a critical role in cancer initiation and progression, and thus may mediate oncogenic or tumor suppressing effects, as well as be a new class of cancer therapeutic targets. We performed high-throughput sequencing of RNA (RNA-seq) to investigate the expression level of lncRNAs and protein-coding genes in 30 esophageal samples, comprised of 15 esophageal squamous cell carcinoma (ESCC) samples and their 15 paired non-tumor tissues. We further developed an integrative bioinformatics method, denoted URW-LPE, to identify key functional lncRNAs that regulate expression of downstream protein-coding genes in ESCC. A number of known onco-lncRNA and many putative novel ones were effectively identified by URW-LPE. Importantly, we identified lncRNA625 as a novel regulator of ESCC cell proliferation, invasion and migration. ESCC patients with high lncRNA625 expression had significantly shorter survival time than those with low expression. LncRNA625 also showed specific prognostic value for patients with metastatic ESCC. Finally, we identified E1A-binding protein p300 (EP300) as a downstream executor of lncRNA625-induced transcriptional responses. These findings establish a catalog of novel cancer-associated functional lncRNAs, which will promote our understanding of lncRNA-mediated regulation in this malignancy

    Gene expression profiling en association with prion-related lesions in the medulla oblongata of symptomatic natural scrapie animals.

    Get PDF
    The pathogenesis of natural scrapie and other prion diseases remains unclear. Examining transcriptome variations in infected versus control animals may highlight new genes potentially involved in some of the molecular mechanisms of prion-induced pathology. The aim of this work was to identify disease-associated alterations in the gene expression profiles of the caudal medulla oblongata (MO) in sheep presenting the symptomatic phase of natural scrapie. The gene expression patterns in the MO from 7 sheep that had been naturally infected with scrapie were compared with 6 controls using a Central Veterinary Institute (CVI) custom designed 4×44K microarray. The microarray consisted of a probe set on the previously sequenced ovine tissue library by CVI and was supplemented with all of the Ovis aries transcripts that are currently publicly available. Over 350 probe sets displayed greater than 2-fold changes in expression. We identified 148 genes from these probes, many of which encode proteins that are involved in the immune response, ion transport, cell adhesion, and transcription. Our results confirm previously published gene expression changes that were observed in murine models with induced scrapie. Moreover, we have identified new genes that exhibit differential expression in scrapie and could be involved in prion neuropathology. Finally, we have investigated the relationship between gene expression profiles and the appearance of the main scrapie-related lesions, including prion protein deposition, gliosis and spongiosis. In this context, the potential impacts of these gene expression changes in the MO on scrapie development are discussed

    Evaluation of Methods for Gene Selection in Melanoma Cell Lines

    Get PDF
    A major objective in microarray experiments is to identify a panel of genes that are associated with a disease outcome or trait. Many statistical methods have been proposed for gene selection within the last fifteen years. While the comparison of some of these methods has been done, most of them concentrated on finding gene signatures based on two groups. This study evaluates four gene selection methods when the outcome of interested is continuous in nature. We provide a comparative review of four methods: the Statistical Analysis of Microarrays (SAM), the Linear Models for Microarray Analysis (LIMMA), the Lassoed Principal Components (LPC), and the Quantitative Trait Analysis (QTA). Comparison is based on the power to identify differentially expressed genes, the predictive ability of the genelists for a continuous outcome (G2 checkpoint function), and the prognostic properties of the genelists for distant metastasis-free survival. A simulated dataset and a publicly available melanoma cell lines dataset are used for simulations and validation, respectively. A primary melanoma dataset is used for assessment of prognosis. No common genes were found among the genelists from the four methods. While the SAM was generally the best in terms of power, the QTA genelist performed the best in the prediction of the G2 checkpoint function. Identification of genelists depends on the choice of the gene selection method. The QTA method would be preferred over the other approaches in predicting a quantitative outcome in melanoma research. We recommend the development of more robust statistical methods for differential gene expression analysis

    Using Generalized Procrustes Analysis (GPA) for normalization of cDNA microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.</p> <p>Results</p> <p>In this study, we propose a novel assumption-free normalization method based on the Generalized Procrustes Analysis (GPA) algorithm. Using experimental and simulated normal microarray data and boutique array data, we systemically evaluate the ability of the GPA method in normalization compared with six other popular normalization methods including Global, Lowess, Scale, Quantile, VSN, and one boutique array-specific housekeeping gene method. The assessment of these methods is based on three different empirical criteria: across-slide variability, the Kolmogorov-Smirnov (K-S) statistic and the mean square error (MSE). Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias.</p> <p>Conclusion</p> <p>The GPA method is an effective normalization approach for microarray data analysis. In particular, it is free from the statistical and biological assumptions inherent in other normalization methods that are often difficult to validate. Therefore, the GPA method has a major advantage in that it can be applied to diverse types of array sets, especially to the boutique array where the majority of genes may be differentially expressed.</p

    Modeling And Identification Of Differentially Regulated Genes Using Transcriptomics And Proteomics Data

    Get PDF
    Photosynthetic organisms are complex dynamical systems, showing a remarkable ability to adapt to different environmental conditions for their survival. Mechanisms underlying the coordination between different cellular processes in these organisms are still poorly understood. In this dissertation we utilize various computational and modeling techniques to analyze transcriptomics and proteomics data sets from several photosynthetic organisms. We try to use changes in expression levels of genes to study responses of these organisms to various environmental conditions such as availability of nutrients, concentrations of chemicals in growth media, and temperature. Three specific problems studied here are transcriptomics modifications in photosynthetic organisms under reduction-oxidation: redox) stress conditions, circadian and diurnal rhythms of cyanobacteria and the effect of incident light patterns on these rhythms, and the coordination between biological processes in cyanobacteria under various growth conditions. Under redox stresses caused by high light treatments, a strong transcriptomic level response, spread across many biological processes, is discovered in the cyanobacterium Synechocystis sp. PCC 6803. Based on statistical tests, expression levels of about 20% of genes in Synechocystis 6803 are identified as significantly affected due to influence of high light. Gene clustering methods reveal that these responses can mainly be classified as transient and consistent responses, depending on the duration of modified behaviors. Many genes related to energy production as well as energy utilization are shown to be strongly affected. Analysis of microarray data under two stress conditions, high light and DCMU treatment, combined with data mining and motif finding algorithms led to a discovery of novel transcription factor, RRTF1 that responds to redox stresses in Arabidopsis thaliana. Time course transcriptomics data from Cyanothece sp. ATCC 51142 have shown strong diurnal rhythms. By combining multiple experimental conditions and using gene classification algorithms based on Fourier scores and angular distances, it is shown that majority of the diurnal genes are in fact light responding. Only about 10% of genes in the genome are categorized as being circadian controlled. A transcription control model based on dynamical systems is employed to identify the interactions between diurnal genes. A phase oscillator network is proposed to model the behavior of different biological processes. Both these models are shown to carry biologically meaningful features. To study the coordination between different biological processes to various environment and genetic modifications, an interaction model is derived using Bayesian network approach, combining all publicly available microarray data sets for Synechocystis sp. PCC 6803. Several novel relationships between biological processes are discovered from the model. Model is used to simulate several experimental conditions, and the response of the model is shown to agree with the experimentally observed behaviors
    • …
    corecore