623 research outputs found

    Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments

    Get PDF
    BACKGROUND: Cluster analyses are used to analyze microarray time-course data for gene discovery and pattern recognition. However, in general, these methods do not take advantage of the fact that time is a continuous variable, and existing clustering methods often group biologically unrelated genes together. RESULTS: We propose a quadratic regression method for identification of differentially expressed genes and classification of genes based on their temporal expression profiles for non-cyclic short time-course microarray data. This method treats time as a continuous variable, therefore preserves actual time information. We applied this method to a microarray time-course study of gene expression at short time intervals following deafferentation of olfactory receptor neurons. Nine regression patterns have been identified and shown to fit gene expression profiles better than k-means clusters. EASE analysis identified over-represented functional groups in each regression pattern and each k-means cluster, which further demonstrated that the regression method provided more biologically meaningful classifications of gene expression profiles than the k-means clustering method. Comparison with Peddada et al.'s order-restricted inference method showed that our method provides a different perspective on the temporal gene profiles. Reliability study indicates that regression patterns have the highest reliabilities. CONCLUSION: Our results demonstrate that the proposed quadratic regression method improves gene discovery and pattern recognition for non-cyclic short time-course microarray data. With a freely accessible Excel macro, investigators can readily apply this method to their microarray data

    Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, <it>post hoc </it>tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has <it>a priori </it>information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes.</p> <p>Results</p> <p>We developed a four step, <it>post hoc </it>pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with <it>post hoc </it>'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material.</p> <p>Conclusion</p> <p>The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit <it>a priori </it>expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.</p

    A novel fuzzy and multi-objective evolutionary algorithm based gene assignment for clustering short time series expression data

    Get PDF
    Conventional clustering algorithms based on Euclidean distance or Pearson correlation coefficient are not able to include order information in the distance metric and also unable to distinguish between random and real biological patterns. We present template based clustering algorithm for time series gene expression data. Template profiles are defined based on up-down regulation of genes between consecutive time points. Assignment of genes to templates is based on fuzzy membership function. Multi-objective evolutionary algorithm is used to determine compact clusters with varying number of templates. Statistical significance of each template is determined using permutation based non-parametric test. Statistically significant profiles are further tested for their biological relevance using gene ontology analysis. The algorithm was able to distinguish between real and noisy pattern when tested on artificial and real biological data. The proposed algorithm has shown better or similar performance compared to STEM and better than k-means on a real biological data

    h-Profile plots for the discovery and exploration of patterns in gene expression data with an application to time course data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An ever increasing number of techniques are being used to find genes with similar profiles from microarray studies. Visualization of gene expression profiles can aid this process, potentially contributing to the identification of co-regulated genes and gene function as well as network development.</p> <p>Results</p> <p>We introduce the h-Profile plot to display gene expression profiles. Thumbnail versions of plots of gene expression profiles are plotted at coordinates such that profiles of similar shape are located in the same sector, with decreasing variance towards the origin. Negatively correlated profiles can easily be identified. A new method for selecting genes with fixed periodicity, but different phase and amplitude is described and used to demonstrate the use of the plots on cell cycle data.</p> <p>Conclusion</p> <p>Visualization tools for gene expression data are important and h-profile plots provide a timely contribution to the field. They allow the simultaneous visualization of many gene expression profiles and can be used for the identification of genes with similar or reversed profiles, the foundation step in many analyses.</p

    Functional assessment of time course microarray data

    Get PDF
    <p>Abstract</p> <p>Motivation</p> <p>Time-course microarray experiments study the progress of gene expression along time across one or several experimental conditions. Most developed analysis methods focus on the clustering or the differential expression analysis of genes and do not integrate functional information. The assessment of the functional aspects of time-course transcriptomics data requires the use of approaches that exploit the activation dynamics of the functional categories to where genes are annotated.</p> <p>Methods</p> <p>We present three novel methodologies for the functional assessment of time-course microarray data. i) maSigFun derives from the maSigPro method, a regression-based strategy to model time-dependent expression patterns and identify genes with differences across series. maSigFun fits a regression model for groups of genes labeled by a functional class and selects those categories which have a significant model. ii) PCA-maSigFun fits a PCA model of each functional class-defined expression matrix to extract orthogonal patterns of expression change, which are then assessed for their fit to a time-dependent regression model. iii) ASCA-functional uses the ASCA model to rank genes according to their correlation to principal time expression patterns and assess functional enrichment on a GSA fashion. We used simulated and experimental datasets to study these novel approaches. Results were compared to alternative methodologies.</p> <p>Results</p> <p>Synthetic and experimental data showed that the different methods are able to capture different aspects of the relationship between genes, functions and co-expression that are biologically meaningful. The methods should not be considered as competitive but they provide different insights into the molecular and functional dynamic events taking place within the biological system under study.</p

    A Temporal -omic Study of Propionibacterium freudenreichii CIRM-BIA1T Adaptation Strategies in Conditions Mimicking Cheese Ripening in the Cold

    Get PDF
    Propionibacterium freudenreichii is used as a ripening culture in Swiss cheese manufacture. It grows when cheeses are ripened in a warm room (about 24°C). Cheeses with an acceptable eye formation level are transferred to a cold room (about 4°C), inducing a marked slowdown of propionic fermentation, but P. freudenreichii remains active in the cold. To investigate the P. freudenreichii strategies of adaptation and survival in the cold, we performed the first global gene expression profile for this species. The time-course transcriptomic response of P. freudenreichii CIRM-BIA1T strain was analyzed at five times of incubation, during growth at 30°C then for 9 days at 4°C, under conditions preventing nutrient starvation. Gene expression was also confirmed by RT-qPCR for 28 genes. In addition, proteomic experiments were carried out and the main metabolites were quantified. Microarray analysis revealed that 565 genes (25% of the protein-coding sequences of P. freudenreichii genome) were differentially expressed during transition from 30°C to 4°C (P<0.05 and |fold change|>1). At 4°C, a general slowing down was observed for genes implicated in the cell machinery. On the contrary, P. freudenreichii CIRM-BIA1T strain over-expressed genes involved in lactate, alanine and serine conversion to pyruvate, in gluconeogenesis, and in glycogen synthesis. Interestingly, the expression of different genes involved in the formation of important cheese flavor compounds, remained unchanged at 4°C. This could explain the contribution of P. freudenreichii to cheese ripening even in the cold. In conclusion, P. freudenreichii remains metabolically active at 4°C and induces pathways to maintain its long-term survival

    Effect of thyroid hormone concentration on the transcriptional response underlying induced metamorphosis in the Mexican axolotl (Ambystoma)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Thyroid hormones (TH) induce gene expression programs that orchestrate amphibian metamorphosis. In contrast to anurans, many salamanders do not undergo metamorphosis in nature. However, they can be induced to undergo metamorphosis via exposure to thyroxine (T<sub>4</sub>). We induced metamorphosis in juvenile Mexican axolotls (<it>Ambystoma mexicanum</it>) using 5 and 50 nM T<sub>4</sub>, collected epidermal tissue from the head at four time points (Days 0, 2, 12, 28), and used microarray analysis to quantify mRNA abundances.</p> <p>Results</p> <p>Individuals reared in the higher T<sub>4 </sub>concentration initiated morphological and transcriptional changes earlier and completed metamorphosis by Day 28. In contrast, initiation of metamorphosis was delayed in the lower T<sub>4 </sub>concentration and none of the individuals completed metamorphosis by Day 28. We identified 402 genes that were statistically differentially expressed by ≥ two-fold between T<sub>4 </sub>treatments at one or more non-Day 0 sampling times. To complement this analysis, we used linear and quadratic regression to identify 542 and 709 genes that were differentially expressed by ≥ two-fold in the 5 and 50 nM T<sub>4 </sub>treatments, respectively.</p> <p>Conclusion</p> <p>We found that T<sub>4 </sub>concentration affected the timing of gene expression and the shape of temporal gene expression profiles. However, essentially all of the identified genes were similarly affected by 5 and 50 nM T<sub>4</sub>. We discuss genes and biological processes that appear to be common to salamander and anuran metamorphosis, and also highlight clear transcriptional differences. Our results show that gene expression in axolotls is diverse and precise, and that axolotls provide new insights about amphibian metamorphosis.</p

    Effect of thyroid hormone concentration on the transcriptional response underlying induced metamorphosis in the Mexican axolotl (\u3ci\u3eAmbystoma\u3c/i\u3e)

    Get PDF
    Background Thyroid hormones (TH) induce gene expression programs that orchestrate amphibian metamorphosis. In contrast to anurans, many salamanders do not undergo metamorphosis in nature. However, they can be induced to undergo metamorphosis via exposure to thyroxine (T4). We induced metamorphosis in juvenile Mexican axolotls (Ambystoma mexicanum) using 5 and 50 nM T4, collected epidermal tissue from the head at four time points (Days 0, 2, 12, 28), and used microarray analysis to quantify mRNA abundances. Results Individuals reared in the higher T4 concentration initiated morphological and transcriptional changes earlier and completed metamorphosis by Day 28. In contrast, initiation of metamorphosis was delayed in the lower T4 concentration and none of the individuals completed metamorphosis by Day 28. We identified 402 genes that were statistically differentially expressed by ≥ two-fold between T4 treatments at one or more non-Day 0 sampling times. To complement this analysis, we used linear and quadratic regression to identify 542 and 709 genes that were differentially expressed by ≥ two-fold in the 5 and 50 nM T4 treatments, respectively. Conclusion We found that T4 concentration affected the timing of gene expression and the shape of temporal gene expression profiles. However, essentially all of the identified genes were similarly affected by 5 and 50 nM T4. We discuss genes and biological processes that appear to be common to salamander and anuran metamorphosis, and also highlight clear transcriptional differences. Our results show that gene expression in axolotls is diverse and precise, and that axolotls provide new insights about amphibian metamorphosis

    A model selection approach to discover age-dependent gene expression patterns using quantile regression models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been a long-standing biological challenge to understand the molecular regulatory mechanisms behind mammalian ageing. Harnessing the availability of many ageing microarray datasets, a number of studies have shown that it is possible to identify genes that have age-dependent differential expression (DE) or differential variability (DV) patterns. The majority of the studies identify "interesting" genes using a linear regression approach, which is known to perform poorly in the presence of outliers or if the underlying age-dependent pattern is non-linear. Clearly a more robust and flexible approach is needed to identify genes with various age-dependent gene expression patterns.</p> <p>Results</p> <p>Here we present a novel model selection approach to discover genes with linear or non-linear age-dependent gene expression patterns from microarray data. To identify DE genes, our method fits three quantile regression models (constant, linear and piecewise linear models) to the expression profile of each gene, and selects the least complex model that best fits the available data. Similarly, DV genes are identified by fitting and comparing two quantile regression models (non-DV and the DV models) to the expression profile of each gene. We show that our approach is much more robust than the standard linear regression approach in discovering age-dependent patterns. We also applied our approach to analyze two human brain ageing datasets and found many biologically interesting gene expression patterns, including some very interesting DV patterns, that have been overlooked in the original studies. Furthermore, we propose that our model selection approach can be extended to discover DE and DV genes from microarray datasets with discrete class labels, by considering different quantile regression models.</p> <p>Conclusion</p> <p>In this paper, we present a novel application of quantile regression models to identify genes that have interesting linear or non-linear age-dependent expression patterns. One important contribution of this paper is to introduce a model selection approach to DE and DV gene identification, which is most commonly tackled by null hypothesis testing approaches. We show that our approach is robust in analyzing real and simulated datasets. We believe that our approach is applicable in many ageing or time-series data analysis tasks.</p

    CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

    Get PDF
    MOTIVATION: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. RESULTS: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. AVAILABILITY AND IMPLEMENTATION: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.This work was supported by the Turkish State Planning Organization [DPT09K120520 to B.K.]; the Bogazici University Research Fund [10A05D4 to B.K., 08A506 to B.K., 6882-12A01D5 to A.T.C.]; TUBITAK [106M444 to B.K., 110E292 to A.T.C.], Biotechnology and Biological Sciences Research Council [BRIC2.2 grant BB/K011138/1 to S.G.O.]; and EU 7th Framework Programme [BIOLEDGE Contract No: 289126 to S.G.O.].This is the final version of the article. It first appeared from Oxford University Press via http://dx.doi.org/10.1093/bioinformatics/btv53
    • …
    corecore