6,016 research outputs found
Recommended from our members
Robust filtering for gene expression time series data with variance constraints
This is the post print version of the article. The official published version can be obtained from the link below - Copyright 2007 Taylor & Francis Ltd.In this paper, an uncertain discrete-time stochastic system is employed to represent a model for gene regulatory networks from time series data. A robust variance-constrained filtering problem is investigated for a gene expression model with stochastic disturbances and norm-bounded parameter uncertainties, where the stochastic perturbation is in the form of a scalar Gaussian white noise with constant variance and the parameter uncertainties enter both the system matrix and the output matrix. The purpose of the addressed robust filtering problem is to design a linear filter such that, for the admissible bounded uncertainties, the filtering error system is Schur stable and the individual error variance is less than a prespecified upper bound. By using the linear matrix inequality (LMI) technique, sufficient conditions are first derived for ensuring the desired filtering performance for the gene expression model. Then the filter gain is characterized in terms of the solution to a set of LMIs, which can easily be solved by using available software packages. A simulation example is exploited for a gene expression model in order to demonstrate the effectiveness of the proposed design procedures.This work was supported in part by the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grants GR/S27658/01 and EP/C524586/1, the Biotechnology and Biological Sciences Research Council (BBSRC) of the UK under Grants BB/C506264/1 and 100/EGM17735, the Nuffield Foundation of the UK under Grant NAL/00630/G, and the Alexander von Humboldt Foundation of Germany
Statistical modelling of transcript profiles of differentially regulated genes
Background: The vast quantities of gene expression profiling data produced in microarray studies, and
the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous
studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of
variance (ANOVA) and the clustering of genes based on simple models fitted to their expression profiles
over time. We report the novel application of statistical non-linear regression modelling techniques to
describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E.
coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models
provides a more precise description of expression profiles, reducing the "noise" of the raw data to
produce a clear "signal" given by the fitted curve, and describing each profile with a small number of
biologically interpretable parameters. This approach then allows the direct comparison and clustering of
the shapes of response patterns between genes and potentially enables a greater exploration and
interpretation of the biological processes driving gene expression.
Results: Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Splitline"
or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification
of genes into those with primary and secondary responses. Five-day profiles were modelled using the
biologically-oriented, critical exponential curve, y(t) = A + (B + Ct)Rt + ε. This non-linear regression
approach allowed the expression patterns for different genes to be compared in terms of curve shape,
time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory
patterns were identified for the five genes studied. Applying the regression modelling approach to
microarray-derived time course data allowed 11% of the Escherichia coli features to be fitted by an
exponential function, and 25% of the Rattus norvegicus features could be described by the critical
exponential model, all with statistical significance of p < 0.05.
Conclusion: The statistical non-linear regression approaches presented in this study provide detailed
biologically oriented descriptions of individual gene expression profiles, using biologically variable data to
generate a set of defining parameters. These approaches have application to the modelling and greater
interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful
choice of appropriate model forms, such statistical regression approaches allow an improved comparison
of gene expression profiles, and may provide an approach for the greater understanding of common
regulatory mechanisms between genes
Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects
Time course microarray data provide insight about dynamic biological
processes. While several clustering methods have been proposed for the analysis
of these data structures, comparison and selection of appropriate clustering
methods are seldom discussed. We compared probabilistic based clustering
methods and distance based clustering methods for time course microarray
data. Among probabilistic methods, we considered: smoothing spline clustering
also known as model based functional data analysis (MFDA), functional
clustering models for sparsely sampled data (FCM) and model-based clustering
(MCLUST). Among distance based methods, we considered: weighted gene
co-expression network analysis (WGCNA), clustering with dynamic time warping
distance (DTW) and clustering with autocorrelation based distance (ACF). We
studied these algorithms in both simulated settings and case study data. Our
investigations showed that FCM performed very well when gene curves were short
and sparse. DTW and WGCNA performed well when gene curves were medium or long
( observations). SSC performed very well when there were clusters of gene
curves similar to one another. Overall, ACF performed poorly in these
applications. In terms of computation time, FCM, SSC and DTW were considerably
slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more
accurate and biological meaningful clustering results. WGCNA and MCLUST are the
best methods among the 6 methods compared, when performance and computation
time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides
model based inference and uncertainty measure of clustering results
Developmental constraints on vertebrate genome evolution
Constraints in embryonic development are thought to bias the direction of
evolution by making some changes less likely, and others more likely, depending
on their consequences on ontogeny. Here, we characterize the constraints acting
on genome evolution in vertebrates. We used gene expression data from two
vertebrates: zebrafish, using a microarray experiment spanning 14 stages of
development, and mouse, using EST counts for 26 stages of development. We show
that, in both species, genes expressed early in development (1) have a more
dramatic effect of knock-out or mutation and (2) are more likely to revert to
single copy after whole genome duplication, relative to genes expressed late.
This supports high constraints on early stages of vertebrate development,
making them less open to innovations (gene gain or gene loss). Results are
robust to different sources of data-gene expression from microarrays, ESTs, or
in situ hybridizations; and mutants from directed KO, transgenic insertions,
point mutations, or morpholinos. We determine the pattern of these constraints,
which differs from the model used to describe vertebrate morphological
conservation ("hourglass" model). While morphological constraints reach a
maximum at mid-development (the "phylotypic" stage), genomic constraints appear
to decrease in a monotonous manner over developmental time
Spectral analysis of gene expression profiles using gene networks
Microarrays have become extremely useful for analysing genetic phenomena, but
establishing a relation between microarray analysis results (typically a list
of genes) and their biological significance is often difficult. Currently, the
standard approach is to map a posteriori the results onto gene networks to
elucidate the functions perturbed at the level of pathways. However,
integrating a priori knowledge of the gene networks could help in the
statistical analysis of gene expression data and in their biological
interpretation. Here we propose a method to integrate a priori the knowledge of
a gene network in the analysis of gene expression data. The approach is based
on the spectral decomposition of gene expression profiles with respect to the
eigenfunctions of the graph, resulting in an attenuation of the high-frequency
components of the expression profiles with respect to the topology of the
graph. We show how to derive unsupervised and supervised classification
algorithms of expression profiles, resulting in classifiers with biological
relevance. We applied the method to the analysis of a set of expression
profiles from irradiated and non-irradiated yeast strains. It performed at
least as well as the usual classification but provides much more biologically
relevant results and allows a direct biological interpretation
- …