53,659 research outputs found

    Statistical modelling of transcript profiles of differentially regulated genes

    Get PDF
    Background: The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA) and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results: Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Splitline" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t) = A + (B + Ct)Rt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data allowed 11% of the Escherichia coli features to be fitted by an exponential function, and 25% of the Rattus norvegicus features could be described by the critical exponential model, all with statistical significance of p < 0.05. Conclusion: The statistical non-linear regression approaches presented in this study provide detailed biologically oriented descriptions of individual gene expression profiles, using biologically variable data to generate a set of defining parameters. These approaches have application to the modelling and greater interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful choice of appropriate model forms, such statistical regression approaches allow an improved comparison of gene expression profiles, and may provide an approach for the greater understanding of common regulatory mechanisms between genes

    pGQL: A probabilistic graphical query language for gene expression time courses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.</p> <p>Results</p> <p>We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.</p> <p>Conclusions</p> <p>We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.</p

    Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dendritic cells (DC) play a central role in primary immune responses and become potent stimulators of the adaptive immune response after undergoing the critical process of maturation. Understanding the dynamics of DC maturation would provide key insights into this important process. Time course microarray experiments can provide unique insights into DC maturation dynamics. Replicate experiments are necessary to address the issues of experimental and biological variability. Statistical methods and averaging are often used to identify significant signals. Here a novel strategy for filtering of replicate time course microarray data, which identifies consistent signals between the replicates, is presented and applied to a DC time course microarray experiment.</p> <p>Results</p> <p>The temporal dynamics of DC maturation were studied by stimulating DC with poly(I:C) and following gene expression at 5 time points from 1 to 24 hours. The novel filtering strategy uses standard statistical and fold change techniques, along with the consistency of replicate temporal profiles, to identify those differentially expressed genes that were consistent in two biological replicate experiments. To address the issue of cluster reproducibility a consensus clustering method, which identifies clusters of genes whose expression varies consistently between replicates, was also developed and applied. Analysis of the resulting clusters revealed many known and novel characteristics of DC maturation, such as the up-regulation of specific immune response pathways. Intriguingly, more genes were down-regulated than up-regulated. Results identify a more comprehensive program of down-regulation, including many genes involved in protein synthesis, metabolism, and housekeeping needed for maintenance of cellular integrity and metabolism.</p> <p>Conclusions</p> <p>The new filtering strategy emphasizes the importance of consistent and reproducible results when analyzing microarray data and utilizes consistency between replicate experiments as a criterion in both feature selection and clustering, without averaging or otherwise combining replicate data. Observation of a significant down-regulation program during DC maturation indicates that DC are preparing for cell death and provides a path to better understand the process. This new filtering strategy can be adapted for use in analyzing other large-scale time course data sets with replicates.</p

    Gene expression profiles in the rat streptococcal cell wall-induced arthritis model identified using microarray analysis

    Get PDF
    Experimental arthritis models are considered valuable tools for delineating mechanisms of inflammation and autoimmune phenomena. Use of microarray-based methods represents a new and challenging approach that allows molecular dissection of complex autoimmune diseases such as arthritis. In order to characterize the temporal gene expression profile in joints from the reactivation model of streptococcal cell wall (SCW)-induced arthritis in Lewis (LEW/N) rats, total RNA was extracted from ankle joints from naïve, SCW injected, or phosphate buffered saline injected animals (time course study) and gene expression was analyzed using Affymetrix oligonucleotide microarray technology (RAE230A). After normalization and statistical analysis of data, 631 differentially expressed genes were sorted into clusters based on their levels and kinetics of expression using Spotfire(® )profile search and K-mean cluster analysis. Microarray-based data for a subset of genes were validated using real-time PCR TaqMan(® )analysis. Analysis of the microarray data identified 631 genes (441 upregulated and 190 downregulated) that were differentially expressed (Delta > 1.8, P < 0.01), showing specific levels and patterns of gene expression. The genes exhibiting the highest fold increase in expression on days -13.8, -13, or 3 were involved in chemotaxis, inflammatory response, cell adhesion and extracellular matrix remodelling. Transcriptome analysis identified 10 upregulated genes (Delta > 5), which have not previously been associated with arthritis pathology and are located in genomic regions associated with autoimmune disease. The majority of the downregulated genes were associated with metabolism, transport and regulation of muscle development. In conclusion, the present study describes the temporal expression of multiple disease-associated genes with potential pathophysiological roles in the reactivation model of SCW-induced arthritis in Lewis (LEW/N) rat. These findings improve our understanding of the molecular events that underlie the pathology in this animal model, which is potentially a valuable comparator to human rheumatoid arthritis (RA)

    Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray experiments enable simultaneous measurement of the expression levels of virtually all transcripts present in cells, thereby providing a ‘molecular picture’ of the cell state. On the other hand, the genomic responses to a pharmacological or hormonal stimulus are dynamic molecular processes, where time influences gene activity and expression. The potential use of the statistical analysis of microarray data in time series has not been fully exploited so far, due to the fact that only few methods are available which take into proper account temporal relationships between samples.</p> <p>Results</p> <p>We compared here four different methods to analyze data derived from a time course mRNA expression profiling experiment which consisted in the study of the effects of estrogen on hormone-responsive human breast cancer cells. Gene expression was monitored with the innovative Illumina BeadArray platform, which includes an average of 30-40 replicates for each probe sequence randomly distributed on the chip surface. We present and discuss the results obtained by applying to these datasets different statistical methods for serial gene expression analysis. The influence of the normalization algorithm applied on data and of different parameter or threshold choices for the selection of differentially expressed transcripts has also been evaluated. In most cases, the selection was found fairly robust with respect to changes in parameters and type of normalization. We then identified which genes showed an expression profile significantly affected by the hormonal treatment over time. The final list of differentially expressed genes underwent cluster analysis of functional type, to identify groups of genes with similar regulation dynamics.</p> <p>Conclusions</p> <p>Several methods for processing time series gene expression data are presented, including evaluation of benefits and drawbacks of the different methods applied. The resulting protocol for data analysis was applied to characterization of the gene expression changes induced by estrogen in human breast cancer ZR-75.1 cells over an entire cell cycle.</p

    Ganoderma lucidum polysaccharides in human monocytic leukemia cells: from gene expression to network construction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Ganoderma lucidum </it>has been widely used as a herbal medicine for promoting health and longevity in China and other Asian countries. Polysaccharide extracts from <it>Ganoderma lucidum </it>have been reported to exhibit immuno-modulating and anti-tumor activities. In previous studies, F3, the active component of the polysaccharide extract, was found to activate various cytokines such as IL-1, IL-6, IL-12, and TNF-<it>α</it>. This gave rise to our investigation on how F3 stimulates immuno-modulating or anti-tumor effects in human leukemia THP-1 cells.</p> <p>Results</p> <p>Here, we integrated time-course DNA microarray analysis, quantitative PCR assays, and bioinformatics methods to study the F3-induced effects in THP-1 cells. Significantly disturbed pathways induced by F3 were identified with statistical analysis on microarray data. The apoptosis induction through the DR3 and DR4/5 death receptors was found to be one of the most significant pathways and play a key role in THP-1 cells after F3 treatment. Based on time-course gene expression measurements of the identified pathway, we reconstructed a plausible regulatory network of the involved genes using reverse-engineering computational approach.</p> <p>Conclusion</p> <p>Our results showed that F3 may induce death receptor ligands to initiate signaling via receptor oligomerization, recruitment of specialized adaptor proteins and activation of caspase cascades.</p

    ARSyN: a method for the identification and removal of systematic noise in multifactorial time-course microarray experiments

    Full text link
    Transcriptomic profiling experiments that aim to the identification of responsive genes in specific biological conditions are commonly set up under defined experimental designs that try to assess the effects of factors and their interactions on gene expression. Data from these controlled experiments, however, may also contain sources of unwanted noise that can distort the signal under study, affect the residuals of applied statistical models, and hamper data analysis. Commonly, normalization methods are applied to transcriptomics data to remove technical artifacts, but these are normally based on general assumptions of transcript distribution and greatly ignore both the characteristics of the experiment under consideration and the coordinative nature of gene expression. In this paper, we propose a novel methodology, ARSyN, for the preprocessing of microarray data that takes into account these 2 last aspects. By combining analysis of variance (ANOVA) modeling of gene expression values and multivariate analysis of estimated effects, the method identifies the nonstructured part of the signal associated to the experimental factors (the noise within the signal) and the structured variation of the ANOVA errors (the signal of the noise). By removing these noise fractions from the original data, we create a filtered data set that is rich in the information of interest and includes only the random noise required for inferential analysis. In this work, we focus on multifactorial time course microarray (MTCM) experiments with 2 factors: one quantitative such as time or dosage and the other qualitative, as tissue, strain, or treatment. However, the method can be used in other situations such as experiments with only one factor or more complex designs with more than 2 factors. The filtered data obtained after applying ARSyN can be further analyzed with the appropriate statistical technique to obtain the biological information required. To evaluate the performance of the filtering strategy, we have applied different statistical approaches for MTCM analysis to several real and simulateddata sets, studying also the efficiency of these techniques. By comparing the results obtained with the original and ARSyN filtered data and also with other filtering techniques, we can conclude that the proposed method increases the statistical power to detect biological signals, especially in cases where there are high levels of structural noise. Software for ARSyN is freely available at http://www.ua.es/personal/mj.nuedaSpanish MICINN Project (BIO2008-04368-E and DPI2008-06880-C03-03/DPI).Nueda, MJ.; Ferrer Riquelme, AJ.; Conesa, A. (2011). ARSyN: a method for the identification and removal of systematic noise in multifactorial time-course microarray experiments. Biostatistics. 13(3):553-566. doi:10.1093/biostatistics/kxr042S553566133Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., & Dopazo, J. (2007). FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Research, 35(suppl_2), W91-W96. doi:10.1093/nar/gkm260Alter, O., Brown, P. O., & Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18), 10101-10106. doi:10.1073/pnas.97.18.10101Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C. M., & Marron, J. S. (2003). Adjustment of systematic microarray data biases. Bioinformatics, 20(1), 105-114. doi:10.1093/bioinformatics/btg385Brumós, J., Colmenero-Flores, J. M., Conesa, A., Izquierdo, P., Sánchez, G., Iglesias, D. J., … Talón, M. (2009). Membrane transporters and carbon metabolism implicated in chloride homeostasis differentiate salt stress responses in tolerant and sensitive Citrus rootstocks. Functional & Integrative Genomics, 9(3), 293-309. doi:10.1007/s10142-008-0107-6Conesa, A., Nueda, M. J., Ferrer, A., & Talon, M. (2006). maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics, 22(9), 1096-1102. doi:10.1093/bioinformatics/btl056Heijne, W. H. ., Stierum, R. H., Slijper, M., van Bladeren, P. J., & van Ommen, B. (2003). Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. Biochemical Pharmacology, 65(5), 857-875. doi:10.1016/s0006-2952(02)01613-1Jansen, J. J., Hoefsloot, H. C. J., van der Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005). ASCA: analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19(9), 469-481. doi:10.1002/cem.952Johnson, W. E., Li, C., & Rabinovic, A. (2006). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118-127. doi:10.1093/biostatistics/kxj037Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., … Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10), 733-739. doi:10.1038/nrg2825Luo, J., Schumacher, M., Scherer, A., Sanoudou, D., Megherbi, D., Davison, T., … Zhang, J. (2010). A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. The Pharmacogenomics Journal, 10(4), 278-291. doi:10.1038/tpj.2010.57(2010). The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Biotechnology, 28(8), 827-838. doi:10.1038/nbt.1665Morán, J. M., Ortiz-Ortiz, M. A., Ruiz-Mesa, L. M., & Fuentes, J. M. (2010). Nitric oxide in paraquat-mediated toxicity: A review. Journal of Biochemical and Molecular Toxicology, 24(6), 402-409. doi:10.1002/jbt.20348Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C. J., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA. Bioinformatics, 23(14), 1792-1800. doi:10.1093/bioinformatics/btm251Rensink, W. A., Iobst, S., Hart, A., Stegalkina, S., Liu, J., & Buell, C. R. (2005). Gene expression profiling of potato responses to cold, heat, and salt stress. Functional & Integrative Genomics, 5(4), 201-207. doi:10.1007/s10142-005-0141-6Smilde, A. K., Jansen, J. J., Hoefsloot, H. C. J., Lamers, R.-J. A. N., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data. Bioinformatics, 21(13), 3043-3048. doi:10.1093/bioinformatics/bti476Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., & Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102(36), 12837-12842. doi:10.1073/pnas.0504609102Svendsen, C., Owen, J., Kille, P., Wren, J., Jonker, M. J., Headley, B. A., … Spurgeon, D. J. (2008). Comparative Transcriptomic Responses to Chronic Cadmium, Fluoranthene, and Atrazine Exposure in Lumbricus rubellus. Environmental Science & Technology, 42(11), 4208-4214. doi:10.1021/es702745dTai, Y. C., & Speed, T. P. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. The Annals of Statistics, 34(5), 2387-2412. doi:10.1214/009053606000000759Chuan Tai, Y., & Speed, T. P. (2008). On Gene Ranking Using Replicated Microarray Time Course Data. Biometrics, 65(1), 40-51. doi:10.1111/j.1541-0420.2008.01057.xYang, Y. H. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), 15e-15. doi:10.1093/nar/30.4.e1

    Analysis of tiling array expression studies with flexible designs in Bioconductor (waveTiling)

    Get PDF
    Background: Existing statistical methods for tiling array transcriptome data either focus on transcript discovery in one biological or experimental condition or on the detection of differential expression between two conditions. Increasingly often, however, biologists are interested in time-course studies, studies with more than two conditions or even multiple-factor studies. As these studies are currently analyzed with the traditional microarray analysis techniques, they do not exploit the genome-wide nature of tiling array data to its full potential. Results: We present an R Bioconductor package, waveTiling, which implements a wavelet-based model for analyzing transcriptome data and extends it towards more complex experimental designs. With waveTiling the user is able to discover (1) group-wise expressed regions, (2) differentially expressed regions between any two groups in single-factor studies and in (3) multifactorial designs. Moreover, for time-course experiments it is also possible to detect (4) linear time effects and (5) a circadian rhythm of transcripts. By considering the expression values of the individual tiling probes as a function of genomic position, effect regions can be detected regardless of existing annotation. Three case studies with different experimental set-ups illustrate the use and the flexibility of the model-based transcriptome analysis. Conclusions: The waveTiling package provides the user with a convenient tool for the analysis of tiling array trancriptome data for a multitude of experimental set-ups. Regardless of the study design, the probe-wise analysis allows for the detection of transcriptional effects in both exonic, intronic and intergenic regions, without prior consultation of existing annotation
    corecore