220 research outputs found

    Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies

    Full text link
    This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.Comment: Published in at http://dx.doi.org/10.1214/11-STS354 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A hierarchical Bayesian model for inference of copy number variants and their association to gene expression

    Get PDF
    A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Wavelet-Based Bayesian Estimation of Partially Linear Regression Models with Long Memory Errors

    Get PDF
    In this paper we focus on partially linear regression models with long memory errors, and propose a wavelet-based Bayesian procedure that allows the simultaneous estimation of the model parameters and the nonparametric part of the model. Employing discrete wavelet transforms is crucial in order to simplify the dense variance-covariance matrix of the long memory error. We achieve a fully Bayesian inference by adopting a Metropolis algorithm within a Gibbs sampler. We evaluate the performances of the proposed method on simulated data. In addition, we present an application to Northern hemisphere temperature data, a benchmark in the long memory literature

    Bayesian Image-on-Scalar Regression with a Spatial Global-Local Spike-and-Slab Prior

    Full text link
    In this article, we propose a novel spatial global-local spike-and-slab selection prior for image-on-scalar regression. We consider a Bayesian hierarchical Gaussian process model for image smoothing, that uses a flexible Inverse-Wishart process prior to handle within-image dependency, and propose a general global-local spatial selection prior that extends a rich class of well-studied selection priors. Unlike existing constructions, we achieve simultaneous global (i.e, at covariate-level) and local (i.e., at pixel/voxel-level) selection by introducing `participation rate' parameters that measure the probability for the individual covariates to affect the observed images. This along with a hard-thresholding strategy leads to dependency between selections at the two levels, introduces extra sparsity at the local level, and allows the global selection to be informed by the local selection, all in a model-based manner. We design an efficient Gibbs sampler that allows inference for large image data. We show on simulated data that parameters are interpretable and lead to efficient selection. Finally, we demonstrate performance of the proposed model by using data from the Autism Brain Imaging Data Exchange (ABIDE) study. To the best of our knowledge, the proposed model construction is the first in the Bayesian literature to simultaneously achieve image smoothing, parameter estimation and a two-level variable selection for image-on-scalar regression

    Semiparametric Latent ANOVA Model for Event-Related Potentials

    Full text link
    Event-related potentials (ERPs) extracted from electroencephalography (EEG) data in response to stimuli are widely used in psychological and neuroscience experiments. A major goal is to link ERP characteristic components to subject-level covariates. Existing methods typically follow two-step approaches, first identifying ERP components using peak detection methods and then relating them to the covariates. This approach, however, can lead to loss of efficiency due to inaccurate estimates in the initial step, especially considering the low signal-to-noise ratio of EEG data. To address this challenge, we propose a semiparametric latent ANOVA model (SLAM) that unifies inference on ERP components and their association to covariates. SLAM models ERP waveforms via a structured Gaussian process prior that encodes ERP latency in its derivative and links the subject-level latencies to covariates using a latent ANOVA. This unified Bayesian framework provides estimation at both population- and subject- levels, improving the efficiency of the inference by leveraging information across subjects. We automate posterior inference and hyperparameter tuning using a Monte Carlo expectation-maximization algorithm. We demonstrate the advantages of SLAM over competing methods via simulations. Our method allows us to examine how factors or covariates affect the magnitude and/or latency of ERP components, which in turn reflect cognitive, psychological or neural processes. We exemplify this via an application to data from an ERP experiment on speech recognition, where we assess the effect of age on two components of interest. Our results verify the scientific findings that older people take a longer reaction time to respond to external stimuli because of the delay in perception and brain processes

    Semiparametric Bayesian Inference for Local Extrema of Functions in the Presence of Noise

    Full text link
    There is a wide range of applications where the local extrema of a function are the key quantity of interest. However, there is surprisingly little work on methods to infer local extrema with uncertainty quantification in the presence of noise. By viewing the function as an infinite-dimensional nuisance parameter, a semiparametric formulation of this problem poses daunting challenges, both methodologically and theoretically, as (i) the number of local extrema may be unknown, and (ii) the induced shape constraints associated with local extrema are highly irregular. In this article, we address these challenges by suggesting an encompassing strategy that eliminates the need to specify the number of local extrema, which leads to a remarkably simple, fast semiparametric Bayesian approach for inference on local extrema. We provide closed-form characterization of the posterior distribution and study its large sample behaviors under this encompassing regime. We show a multi-modal Bernstein-von Mises phenomenon in which the posterior measure converges to a mixture of Gaussians with the number of components matching the underlying truth, leading to posterior exploration that accounts for multi-modality. We illustrate the method through simulations and a real data application to event-related potential analysis

    Spiked Dirichlet Process Priors for Gaussian Process Models

    Get PDF
    We expand a framework for Bayesian variable selection for Gaussian process (GP) models by employing spiked Dirichlet process (DP) prior constructions over set partitions containing covariates. Our approach results in a nonparametric treatment of the distribution of the covariance parameters of the GP covariance matrix that in turn induces a clustering of the covariates. We evaluate two prior constructions: the first one employs a mixture of a point-mass and a continuous distribution as the centering distribution for the DP prior, therefore, clustering all covariates. The second one employs a mixture of a spike and a DP prior with a continuous distribution as the centering distribution, which induces clustering of the selected covariates only. DP models borrow information across covariates through model-based clustering. Our simulation results, in particular, show a reduction in posterior sampling variability and, in turn, enhanced prediction performances. In our model formulations, we accomplish posterior inference by employing novel combinations and extensions of existing algorithms for inference with DP prior models and compare performances under the two prior constructions

    A Bayesian Joint Model for Compositional Mediation Effect Selection in Microbiome Data

    Full text link
    Analyzing multivariate count data generated by high-throughput sequencing technology in microbiome research studies is challenging due to the high-dimensional and compositional structure of the data and overdispersion. In practice, researchers are often interested in investigating how the microbiome may mediate the relation between an assigned treatment and an observed phenotypic response. Existing approaches designed for compositional mediation analysis are unable to simultaneously determine the presence of direct effects, marginal indirect effects, overall indirect effects, as well potential confounders, while simultaneously quantifying their uncertainty. We propose a formulation of a Bayesian joint model for compositional data that allows for the identification, estimation, and uncertainty quantification of various causal estimands in high-dimensional mediation analysis. We conduct simulation studies and compare our method's mediation effects selection performance with existing methods. Finally, we apply our method to a benchmark data set investigating the sub-therapeutic antibiotic treatment effect on body weight in early-life mice
    • …
    corecore