220 research outputs found
Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies
This paper presents a unified treatment of Gaussian process models that
extends to data from the exponential dispersion family and to survival data.
Our specific interest is in the analysis of data sets with predictors that have
an a priori unknown form of possibly nonlinear associations to the response.
The modeling approach we describe incorporates Gaussian processes in a
generalized linear model framework to obtain a class of nonparametric
regression models where the covariance matrix depends on the predictors. We
consider, in particular, continuous, categorical and count responses. We also
look into models that account for survival outcomes. We explore alternative
covariance formulations for the Gaussian process prior and demonstrate the
flexibility of the construction. Next, we focus on the important problem of
selecting variables from the set of possible predictors and describe a general
framework that employs mixture priors. We compare alternative MCMC strategies
for posterior inference and achieve a computationally efficient and practical
approach. We demonstrate performances on simulated and benchmark data sets.Comment: Published in at http://dx.doi.org/10.1214/11-STS354 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Wavelet-Based Bayesian Estimation of Partially Linear Regression Models with Long Memory Errors
In this paper we focus on partially linear regression models with long memory errors, and propose a wavelet-based Bayesian procedure that allows the simultaneous estimation of the model parameters and the nonparametric part of the model. Employing discrete wavelet transforms is crucial in order to simplify the dense variance-covariance matrix of the long memory error. We achieve a fully Bayesian inference by adopting a Metropolis algorithm within a Gibbs sampler. We evaluate the performances of the proposed method on simulated data. In addition, we present an application to Northern hemisphere temperature data, a benchmark in the long memory literature
Bayesian Image-on-Scalar Regression with a Spatial Global-Local Spike-and-Slab Prior
In this article, we propose a novel spatial global-local spike-and-slab
selection prior for image-on-scalar regression. We consider a Bayesian
hierarchical Gaussian process model for image smoothing, that uses a flexible
Inverse-Wishart process prior to handle within-image dependency, and propose a
general global-local spatial selection prior that extends a rich class of
well-studied selection priors. Unlike existing constructions, we achieve
simultaneous global (i.e, at covariate-level) and local (i.e., at
pixel/voxel-level) selection by introducing `participation rate' parameters
that measure the probability for the individual covariates to affect the
observed images. This along with a hard-thresholding strategy leads to
dependency between selections at the two levels, introduces extra sparsity at
the local level, and allows the global selection to be informed by the local
selection, all in a model-based manner. We design an efficient Gibbs sampler
that allows inference for large image data. We show on simulated data that
parameters are interpretable and lead to efficient selection. Finally, we
demonstrate performance of the proposed model by using data from the Autism
Brain Imaging Data Exchange (ABIDE) study. To the best of our knowledge, the
proposed model construction is the first in the Bayesian literature to
simultaneously achieve image smoothing, parameter estimation and a two-level
variable selection for image-on-scalar regression
Semiparametric Latent ANOVA Model for Event-Related Potentials
Event-related potentials (ERPs) extracted from electroencephalography (EEG)
data in response to stimuli are widely used in psychological and neuroscience
experiments. A major goal is to link ERP characteristic components to
subject-level covariates. Existing methods typically follow two-step
approaches, first identifying ERP components using peak detection methods and
then relating them to the covariates. This approach, however, can lead to loss
of efficiency due to inaccurate estimates in the initial step, especially
considering the low signal-to-noise ratio of EEG data. To address this
challenge, we propose a semiparametric latent ANOVA model (SLAM) that unifies
inference on ERP components and their association to covariates. SLAM models
ERP waveforms via a structured Gaussian process prior that encodes ERP latency
in its derivative and links the subject-level latencies to covariates using a
latent ANOVA. This unified Bayesian framework provides estimation at both
population- and subject- levels, improving the efficiency of the inference by
leveraging information across subjects. We automate posterior inference and
hyperparameter tuning using a Monte Carlo expectation-maximization algorithm.
We demonstrate the advantages of SLAM over competing methods via simulations.
Our method allows us to examine how factors or covariates affect the magnitude
and/or latency of ERP components, which in turn reflect cognitive,
psychological or neural processes. We exemplify this via an application to data
from an ERP experiment on speech recognition, where we assess the effect of age
on two components of interest. Our results verify the scientific findings that
older people take a longer reaction time to respond to external stimuli because
of the delay in perception and brain processes
Semiparametric Bayesian Inference for Local Extrema of Functions in the Presence of Noise
There is a wide range of applications where the local extrema of a function
are the key quantity of interest. However, there is surprisingly little work on
methods to infer local extrema with uncertainty quantification in the presence
of noise. By viewing the function as an infinite-dimensional nuisance
parameter, a semiparametric formulation of this problem poses daunting
challenges, both methodologically and theoretically, as (i) the number of local
extrema may be unknown, and (ii) the induced shape constraints associated with
local extrema are highly irregular. In this article, we address these
challenges by suggesting an encompassing strategy that eliminates the need to
specify the number of local extrema, which leads to a remarkably simple, fast
semiparametric Bayesian approach for inference on local extrema. We provide
closed-form characterization of the posterior distribution and study its large
sample behaviors under this encompassing regime. We show a multi-modal
Bernstein-von Mises phenomenon in which the posterior measure converges to a
mixture of Gaussians with the number of components matching the underlying
truth, leading to posterior exploration that accounts for multi-modality. We
illustrate the method through simulations and a real data application to
event-related potential analysis
Spiked Dirichlet Process Priors for Gaussian Process Models
We expand a framework for Bayesian variable selection for
Gaussian process (GP) models by employing spiked Dirichlet process (DP)
prior constructions over set partitions containing covariates. Our approach
results in a nonparametric treatment of the distribution of the covariance parameters of the GP covariance matrix that in turn induces a clustering of the
covariates. We evaluate two prior constructions: the first one employs a mixture of a point-mass and a continuous distribution as the centering distribution
for the DP prior, therefore, clustering all covariates. The second one employs a
mixture of a spike and a DP prior with a continuous distribution as the centering distribution, which induces clustering of the selected covariates only. DP
models borrow information across covariates through model-based clustering.
Our simulation results, in particular, show a reduction in posterior sampling
variability and, in turn, enhanced prediction performances. In our model formulations, we accomplish posterior inference by employing novel combinations and extensions of existing algorithms for inference with DP prior models and
compare performances under the two prior constructions
A Bayesian Joint Model for Compositional Mediation Effect Selection in Microbiome Data
Analyzing multivariate count data generated by high-throughput sequencing
technology in microbiome research studies is challenging due to the
high-dimensional and compositional structure of the data and overdispersion. In
practice, researchers are often interested in investigating how the microbiome
may mediate the relation between an assigned treatment and an observed
phenotypic response. Existing approaches designed for compositional mediation
analysis are unable to simultaneously determine the presence of direct effects,
marginal indirect effects, overall indirect effects, as well potential
confounders, while simultaneously quantifying their uncertainty. We propose a
formulation of a Bayesian joint model for compositional data that allows for
the identification, estimation, and uncertainty quantification of various
causal estimands in high-dimensional mediation analysis. We conduct simulation
studies and compare our method's mediation effects selection performance with
existing methods. Finally, we apply our method to a benchmark data set
investigating the sub-therapeutic antibiotic treatment effect on body weight in
early-life mice
- …