25,497 research outputs found
Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data
Background:
Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.
Results:
Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.
Conclusion:
Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance
Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks
BACKGROUND: The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed. Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions. RESULTS: We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs. CONCLUSION: We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae, and Escherichia coli. The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation
Recommended from our members
Robust filtering for gene expression time series data with variance constraints
This is the post print version of the article. The official published version can be obtained from the link below - Copyright 2007 Taylor & Francis Ltd.In this paper, an uncertain discrete-time stochastic system is employed to represent a model for gene regulatory networks from time series data. A robust variance-constrained filtering problem is investigated for a gene expression model with stochastic disturbances and norm-bounded parameter uncertainties, where the stochastic perturbation is in the form of a scalar Gaussian white noise with constant variance and the parameter uncertainties enter both the system matrix and the output matrix. The purpose of the addressed robust filtering problem is to design a linear filter such that, for the admissible bounded uncertainties, the filtering error system is Schur stable and the individual error variance is less than a prespecified upper bound. By using the linear matrix inequality (LMI) technique, sufficient conditions are first derived for ensuring the desired filtering performance for the gene expression model. Then the filter gain is characterized in terms of the solution to a set of LMIs, which can easily be solved by using available software packages. A simulation example is exploited for a gene expression model in order to demonstrate the effectiveness of the proposed design procedures.This work was supported in part by the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grants GR/S27658/01 and EP/C524586/1, the Biotechnology and Biological Sciences Research Council (BBSRC) of the UK under Grants BB/C506264/1 and 100/EGM17735, the Nuffield Foundation of the UK under Grant NAL/00630/G, and the Alexander von Humboldt Foundation of Germany
Statistical modelling of transcript profiles of differentially regulated genes
Background: The vast quantities of gene expression profiling data produced in microarray studies, and
the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous
studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of
variance (ANOVA) and the clustering of genes based on simple models fitted to their expression profiles
over time. We report the novel application of statistical non-linear regression modelling techniques to
describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E.
coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models
provides a more precise description of expression profiles, reducing the "noise" of the raw data to
produce a clear "signal" given by the fitted curve, and describing each profile with a small number of
biologically interpretable parameters. This approach then allows the direct comparison and clustering of
the shapes of response patterns between genes and potentially enables a greater exploration and
interpretation of the biological processes driving gene expression.
Results: Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Splitline"
or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification
of genes into those with primary and secondary responses. Five-day profiles were modelled using the
biologically-oriented, critical exponential curve, y(t) = A + (B + Ct)Rt + Īµ. This non-linear regression
approach allowed the expression patterns for different genes to be compared in terms of curve shape,
time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory
patterns were identified for the five genes studied. Applying the regression modelling approach to
microarray-derived time course data allowed 11% of the Escherichia coli features to be fitted by an
exponential function, and 25% of the Rattus norvegicus features could be described by the critical
exponential model, all with statistical significance of p < 0.05.
Conclusion: The statistical non-linear regression approaches presented in this study provide detailed
biologically oriented descriptions of individual gene expression profiles, using biologically variable data to
generate a set of defining parameters. These approaches have application to the modelling and greater
interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful
choice of appropriate model forms, such statistical regression approaches allow an improved comparison
of gene expression profiles, and may provide an approach for the greater understanding of common
regulatory mechanisms between genes
- ā¦