5,141 research outputs found
Bayesian Gene Set Analysis
Gene expression microarray technologies provide the simultaneous measurements
of a large number of genes. Typical analyses of such data focus on the
individual genes, but recent work has demonstrated that evaluating changes in
expression across predefined sets of genes often increases statistical power
and produces more robust results. We introduce a new methodology for
identifying gene sets that are differentially expressed under varying
experimental conditions. Our approach uses a hierarchical Bayesian framework
where a hyperparameter measures the significance of each gene set. Using
simulated data, we compare our proposed method to alternative approaches, such
as Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA). Our
approach provides the best overall performance. We also discuss the application
of our method to experimental data based on p53 mutation status
Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data
Inference of Temporally Varying Bayesian Networks
When analysing gene expression time series data an often overlooked but
crucial aspect of the model is that the regulatory network structure may change
over time. Whilst some approaches have addressed this problem previously in the
literature, many are not well suited to the sequential nature of the data. Here
we present a method that allows us to infer regulatory network structures that
may vary between time points, utilising a set of hidden states that describe
the network structure at a given time point. To model the distribution of the
hidden states we have applied the Hierarchical Dirichlet Process Hideen Markov
Model, a nonparametric extension of the traditional Hidden Markov Model, that
does not require us to fix the number of hidden states in advance. We apply our
method to exisiting microarray expression data as well as demonstrating is
efficacy on simulated test data
GaGa: A parsimonious and flexible model for differential expression analysis
Hierarchical models are a powerful tool for high-throughput data with a small
to moderate number of replicates, as they allow sharing information across
units of information, for example, genes. We propose two such models and show
its increased sensitivity in microarray differential expression applications.
We build on the gamma--gamma hierarchical model introduced by Kendziorski et
al. [Statist. Med. 22 (2003) 3899--3914] and Newton et al. [Biostatistics 5
(2004) 155--176], by addressing important limitations that may have hampered
its performance and its more widespread use. The models parsimoniously describe
the expression of thousands of genes with a small number of hyper-parameters.
This makes them easy to interpret and analytically tractable. The first model
is a simple extension that improves the fit substantially with almost no
increase in complexity. We propose a second extension that uses a mixture of
gamma distributions to further improve the fit, at the expense of increased
computational burden. We derive several approximations that significantly
reduce the computational cost. We find that our models outperform the original
formulation of the model, as well as some other popular methods for
differential expression analysis. The improved performance is specially
noticeable for the small sample sizes commonly encountered in high-throughput
experiments. Our methods are implemented in the freely available Bioconductor
gaga package.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS244 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A temporal switch model for estimating transcriptional activity in gene expression
Motivation: The analysis and mechanistic modelling of time series gene expression data provided by techniques such as microarrays, NanoString, reverse transcription–polymerase chain reaction and advanced sequencing are invaluable for developing an understanding of the variation in key biological processes. We address this by proposing the estimation of a flexible dynamic model, which decouples temporal synthesis and degradation of mRNA and, hence, allows for transcriptional activity to switch between different states.
Results: The model is flexible enough to capture a variety of observed transcriptional dynamics, including oscillatory behaviour, in a way that is compatible with the demands imposed by the quality, time-resolution and quantity of the data. We show that the timing and number of switch events in transcriptional activity can be estimated alongside individual gene mRNA stability with the help of a Bayesian reversible jump Markov chain Monte Carlo algorithm. To demonstrate the methodology, we focus on modelling the wild-type behaviour of a selection of 200 circadian genes of the model plant Arabidopsis thaliana. The results support the idea that using a mechanistic model to identify transcriptional switch points is likely to strongly contribute to efforts in elucidating and understanding key biological processes, such as transcription and degradation
Bayesian testing of many hypotheses many genes: A study of sleep apnea
Substantial statistical research has recently been devoted to the analysis of
large-scale microarray experiments which provide a measure of the simultaneous
expression of thousands of genes in a particular condition. A typical goal is
the comparison of gene expression between two conditions (e.g., diseased vs.
nondiseased) to detect genes which show differential expression. Classical
hypothesis testing procedures have been applied to this problem and more recent
work has employed sophisticated models that allow for the sharing of
information across genes. However, many recent gene expression studies have an
experimental design with several conditions that requires an even more involved
hypothesis testing approach. In this paper, we use a hierarchical Bayesian
model to address the situation where there are many hypotheses that must be
simultaneously tested for each gene. In addition to having many hypotheses
within each gene, our analysis also addresses the more typical multiple
comparison issue of testing many genes simultaneously. We illustrate our
approach with an application to a study of genes involved in obstructive sleep
apnea in humans.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS241 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Predictive response-relevant clustering of expression data provides insights into disease processes
This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of ;response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the ;meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes-including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets
Listen to genes : dealing with microarray data in the frequency domain
Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes
normalization, clustering and network analysis of genes.
Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and
estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The
normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger
causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger
causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting
networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000
genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new
global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail.
Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum
analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency
domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray
data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by
step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of
potential interest to Arabidopsis researchers
- …