2,100 research outputs found
Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data
Recent technological advances have made it possible to simultaneously measure
multiple protein activities at the single cell level. With such data collected
under different stimulatory or inhibitory conditions, it is possible to infer
the causal relationships among proteins from single cell interventional data.
In this article we propose a Bayesian hierarchical modeling framework to infer
the signaling pathway based on the posterior distributions of parameters in the
model. Under this framework, we consider network sparsity and model the
existence of an association between two proteins both at the overall level
across all experiments and at each individual experimental level. This allows
us to infer the pairs of proteins that are associated with each other and their
causal relationships. We also explicitly consider both intrinsic noise and
measurement error. Markov chain Monte Carlo is implemented for statistical
inference. We demonstrate that this hierarchical modeling can effectively pool
information from different interventional experiments through simulation
studies and real data analysis.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS425 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes
By expressing prior distributions as general stochastic processes,
nonparametric Bayesian methods provide a flexible way to incorporate prior
knowledge and constrain the latent structure in statistical inference. The
Indian buffet process (IBP) is such an example that can be used to define a
prior distribution on infinite binary features, where the exchangeability among
subjects is assumed. The phylogenetic Indian buffet process (pIBP), a
derivative of IBP, enables the modeling of non-exchangeability among subjects
through a stochastic process on a rooted tree, which is similar to that used in
phylogenetics, to describe relationships among the subjects. In this paper, we
study the theoretical properties of IBP and pIBP under a binary factor model.
We establish the posterior contraction rates for both IBP and pIBP and
substantiate the theoretical results through simulation studies. This is the
first work addressing the frequentist property of the posterior behaviors of
IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP
prior to a real data example arising in the field of cancer genomics where the
exchangeability among subjects is violated
NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data
RNA-sequencing (RNA-Seq) has become a powerful technology to characterize
gene expression profiles because it is more accurate and comprehensive than
microarrays. Although statistical methods that have been developed for
microarray data can be applied to RNA-Seq data, they are not ideal due to the
discrete nature of RNA-Seq data. The Poisson distribution and negative binomial
distribution are commonly used to model count data. Recently, Witten (2011)
proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson
assumption may not be as appropriate as negative binomial distribution when
biological replicates are available and in the presence of overdispersion
(i.e., when the variance is larger than the mean). However, it is more
complicated to model negative binomial variables because they involve a
dispersion parameter that needs to be estimated. In this paper, we propose a
negative binomial linear discriminant analysis for RNA-Seq data. By Bayes'
rule, we construct the classifier by fitting a negative binomial model, and
propose some plug-in rules to estimate the unknown parameters in the
classifier. The relationship between the negative binomial classifier and the
Poisson classifier is explored, with a numerical investigation of the impact of
dispersion on the discriminant score. Simulation results show the superiority
of our proposed method. We also analyze four real RNA-Seq data sets to
demonstrate the advantage of our method in real-world applications
- …