When analyzing microarray data, hierarchical models are often used to share
information across genes when estimating means and variances or identifying
differential expression. Many methods utilize some form of the two-level
hierarchical model structure suggested by Kendziorski et al. [Stat. Med. (2003)
22 3899-3914] in which the first level describes the distribution of latent
mean expression levels among genes and among differentially expressed
treatments within a gene. The second level describes the conditional
distribution, given a latent mean, of repeated observations for a single gene
and treatment. Many of these models, including those used in Kendziorski's et
al. [Stat. Med. (2003) 22 3899-3914] EBarrays package, assume that expression
level changes due to treatment effects have the same distribution as expression
level changes from gene to gene. We present empirical evidence that this
assumption is often inadequate and propose three-level hierarchical models as
extensions to the two-level log-normal based EBarrays models to address this
inadequacy. We demonstrate that use of our three-level models dramatically
changes analysis results for a variety of microarray data sets and verify the
validity and improved performance of our suggested method in a series of
simulation studies. We also illustrate the importance of accounting for the
uncertainty of gene-specific error variance estimates when using hierarchical
models to identify differentially expressed genes.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS535 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org