6,016 research outputs found

    Statistical modelling of transcript profiles of differentially regulated genes

    Get PDF
    Background: The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA) and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results: Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Splitline" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t) = A + (B + Ct)Rt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data allowed 11% of the Escherichia coli features to be fitted by an exponential function, and 25% of the Rattus norvegicus features could be described by the critical exponential model, all with statistical significance of p < 0.05. Conclusion: The statistical non-linear regression approaches presented in this study provide detailed biologically oriented descriptions of individual gene expression profiles, using biologically variable data to generate a set of defining parameters. These approaches have application to the modelling and greater interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful choice of appropriate model forms, such statistical regression approaches allow an improved comparison of gene expression profiles, and may provide an approach for the greater understanding of common regulatory mechanisms between genes

    Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

    Full text link
    Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared 33 probabilistic based clustering methods and 33 distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model based functional data analysis (MFDA), functional clustering models for sparsely sampled data (FCM) and model-based clustering (MCLUST). Among distance based methods, we considered: weighted gene co-expression network analysis (WGCNA), clustering with dynamic time warping distance (DTW) and clustering with autocorrelation based distance (ACF). We studied these algorithms in both simulated settings and case study data. Our investigations showed that FCM performed very well when gene curves were short and sparse. DTW and WGCNA performed well when gene curves were medium or long (>=10>=10 observations). SSC performed very well when there were clusters of gene curves similar to one another. Overall, ACF performed poorly in these applications. In terms of computation time, FCM, SSC and DTW were considerably slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more accurate and biological meaningful clustering results. WGCNA and MCLUST are the best methods among the 6 methods compared, when performance and computation time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides model based inference and uncertainty measure of clustering results

    Developmental constraints on vertebrate genome evolution

    Get PDF
    Constraints in embryonic development are thought to bias the direction of evolution by making some changes less likely, and others more likely, depending on their consequences on ontogeny. Here, we characterize the constraints acting on genome evolution in vertebrates. We used gene expression data from two vertebrates: zebrafish, using a microarray experiment spanning 14 stages of development, and mouse, using EST counts for 26 stages of development. We show that, in both species, genes expressed early in development (1) have a more dramatic effect of knock-out or mutation and (2) are more likely to revert to single copy after whole genome duplication, relative to genes expressed late. This supports high constraints on early stages of vertebrate development, making them less open to innovations (gene gain or gene loss). Results are robust to different sources of data-gene expression from microarrays, ESTs, or in situ hybridizations; and mutants from directed KO, transgenic insertions, point mutations, or morpholinos. We determine the pattern of these constraints, which differs from the model used to describe vertebrate morphological conservation ("hourglass" model). While morphological constraints reach a maximum at mid-development (the "phylotypic" stage), genomic constraints appear to decrease in a monotonous manner over developmental time

    Spectral analysis of gene expression profiles using gene networks

    Full text link
    Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. Here we propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We applied the method to the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. It performed at least as well as the usual classification but provides much more biologically relevant results and allows a direct biological interpretation
    corecore