6,654 research outputs found
A full Bayesian hierarchical mixture model for the variance of gene differential expression
<p>Abstract</p> <p>Background</p> <p>In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log<sub>2 </sub>transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log<sub>2 </sub>transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.</p> <p>Results</p> <p>We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance.</p> <p>Conclusion</p> <p>The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the <it>p</it>-values obtained are more robust than either using a constant gene or gene-specific variance estimate.</p
Bayesian testing of many hypotheses many genes: A study of sleep apnea
Substantial statistical research has recently been devoted to the analysis of
large-scale microarray experiments which provide a measure of the simultaneous
expression of thousands of genes in a particular condition. A typical goal is
the comparison of gene expression between two conditions (e.g., diseased vs.
nondiseased) to detect genes which show differential expression. Classical
hypothesis testing procedures have been applied to this problem and more recent
work has employed sophisticated models that allow for the sharing of
information across genes. However, many recent gene expression studies have an
experimental design with several conditions that requires an even more involved
hypothesis testing approach. In this paper, we use a hierarchical Bayesian
model to address the situation where there are many hypotheses that must be
simultaneously tested for each gene. In addition to having many hypotheses
within each gene, our analysis also addresses the more typical multiple
comparison issue of testing many genes simultaneously. We illustrate our
approach with an application to a study of genes involved in obstructive sleep
apnea in humans.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS241 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An Empirical Bayes Approach for Multiple Tissue eQTL Analysis
Expression quantitative trait loci (eQTL) analyses, which identify genetic
markers associated with the expression of a gene, are an important tool in the
understanding of diseases in human and other populations. While most eQTL
studies to date consider the connection between genetic variation and
expression in a single tissue, complex, multi-tissue data sets are now being
generated by the GTEx initiative. These data sets have the potential to improve
the findings of single tissue analyses by borrowing strength across tissues,
and the potential to elucidate the genotypic basis of differences between
tissues.
In this paper we introduce and study a multivariate hierarchical Bayesian
model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL directly models the
vector of correlations between expression and genotype across tissues. It
explicitly captures patterns of variation in the presence or absence of eQTLs,
as well as the heterogeneity of effect sizes across tissues. Moreover, the
model is applicable to complex designs in which the set of donors can (i) vary
from tissue to tissue, and (ii) exhibit incomplete overlap between tissues. The
MT-eQTL model is marginally consistent, in the sense that the model for a
subset of tissues can be obtained from the full model via marginalization.
Fitting of the MT-eQTL model is carried out via empirical Bayes, using an
approximate EM algorithm. Inferences concerning eQTL detection and the
configuration of eQTLs across tissues are derived from adaptive thresholding of
local false discovery rates, and maximum a-posteriori estimation, respectively.
We investigate the MT-eQTL model through a simulation study, and rigorously
establish the FDR control of the local FDR testing procedure under mild
assumptions appropriate for dependent data.Comment: accepted by Biostatistic
Bayesian Gene Set Analysis
Gene expression microarray technologies provide the simultaneous measurements
of a large number of genes. Typical analyses of such data focus on the
individual genes, but recent work has demonstrated that evaluating changes in
expression across predefined sets of genes often increases statistical power
and produces more robust results. We introduce a new methodology for
identifying gene sets that are differentially expressed under varying
experimental conditions. Our approach uses a hierarchical Bayesian framework
where a hyperparameter measures the significance of each gene set. Using
simulated data, we compare our proposed method to alternative approaches, such
as Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA). Our
approach provides the best overall performance. We also discuss the application
of our method to experimental data based on p53 mutation status
Non-parametric Bayesian modelling of digital gene expression data
Next-generation sequencing technologies provide a revolutionary tool for
generating gene expression data. Starting with a fixed RNA sample, they
construct a library of millions of differentially abundant short sequence tags
or "reads", which constitute a fundamentally discrete measure of the level of
gene expression. A common limitation in experiments using these technologies is
the low number or even absence of biological replicates, which complicates the
statistical analysis of digital gene expression data. Analysis of this type of
data has often been based on modified tests originally devised for analysing
microarrays; both these and even de novo methods for the analysis of RNA-seq
data are plagued by the common problem of low replication. We propose a novel,
non-parametric Bayesian approach for the analysis of digital gene expression
data. We begin with a hierarchical model for modelling over-dispersed count
data and a blocked Gibbs sampling algorithm for inferring the posterior
distribution of model parameters conditional on these counts. The algorithm
compensates for the problem of low numbers of biological replicates by
clustering together genes with tag counts that are likely sampled from a common
distribution and using this augmented sample for estimating the parameters of
this distribution. The number of clusters is not decided a priori, but it is
inferred along with the remaining model parameters. We demonstrate the ability
of this approach to model biological data with high fidelity by applying the
algorithm on a public dataset obtained from cancerous and non-cancerous neural
tissues
- …