Search CORE

29,241 research outputs found

Smoothing and mean-covariance estimation of functional data with a Bayesian hierarchical model

Author: Choi Taeryon
Cox Dennis D.
Yang Jingjing
Zhu Hongxiao
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 07/07/2015
Field of study

Functional data, with basic observational units being functions (e.g., curves, surfaces) varying over a continuum, are frequently encountered in various applications. While many statistical tools have been developed for functional data analysis, the issue of smoothing all functional observations simultaneously is less studied. Existing methods often focus on smoothing each individual function separately, at the risk of removing important systematic patterns common across functions. We propose a nonparametric Bayesian approach to smooth all functional observations simultaneously and nonparametrically. In the proposed approach, we assume that the functional observations are independent Gaussian processes subject to a common level of measurement errors, enabling the borrowing of strength across all observations. Unlike most Gaussian process regression models that rely on pre-specified structures for the covariance kernel, we adopt a hierarchical framework by assuming a Gaussian process prior for the mean function and an Inverse-Wishart process prior for the covariance function. These prior assumptions induce an automatic mean-covariance estimation in the posterior inference in addition to the simultaneous smoothing of all observations. Such a hierarchical framework is flexible enough to incorporate functional data with different characteristics, including data measured on either common or uncommon grids, and data with either stationary or nonstationary covariance structures. Simulations and real data analysis demonstrate that, in comparison with alternative methods, the proposed Bayesian approach achieves better smoothing accuracy and comparable mean-covariance estimation results. Furthermore, it can successfully retain the systematic patterns in the functional observations that are usually neglected by the existing functional data analyses based on individual-curve smoothing.Comment: Submitted to Bayesian Analysi

arXiv.org e-Print Archive

Crossref

DSpace at Rice University

High-Dimensional Bayesian Geostatistics

Author: Banerjee Sudipto
Publication venue
Publication date: 01/01/2017
Field of study

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as "priors" for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has

\sim n

floating point operations (flops), where

n

the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings

arXiv.org e-Print Archive

Ezid

eScholarship - University of California

Bayesian Nonstationary Spatial Modeling for Very Large Datasets

Author: Banerjee
Banerjee
Berliner
Bevilacqua
Calder
Cressie
Cressie
Cressie
Cressie
Cressie
Curriero
Dyk
Eidsvik
Finley
Furrer
Geman
Gilbert
Gneiting
Gneiting
Green
Guhaniyogi
Haario
Hastings
Henderson
Higdon
Holmes
Holmes
Kang
Kang
Kanter
Kass
Katzfuss
Katzfuss
Kaufman
Knuth
Lemos
Lindgren
Lindsay
Lopes
Mardia
Metropolis
Paciorek
Pracilio
Sang
Sang
Shaby
Sherman
Shi
Stein
Stein
Taylor
Viscarra Rossel
Wikle
Wikle
Wikle
Xu
Publication venue: 'Wiley'
Publication date: 21/12/2012
Field of study

With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: First, traditional spatial-statistical techniques are often unable to handle large numbers of observations in a computationally feasible way. Second, for large and heterogeneous spatial domains, it is often not appropriate to assume that a process of interest is stationary over the entire domain. We address the first challenge by using a model combining a low-rank component, which allows for flexible modeling of medium-to-long-range dependence via a set of spatial basis functions, with a tapered remainder component, which allows for modeling of local dependence using a compactly supported covariance function. Addressing the second challenge, we propose two extensions to this model that result in increased flexibility: First, the model is parameterized based on a nonstationary Matern covariance, where the parameters vary smoothly across space. Second, in our fully Bayesian model, all components and parameters are considered random, including the number, locations, and shapes of the basis functions used in the low-rank component. Using simulated data and a real-world dataset of high-resolution soil measurements, we show that both extensions can result in substantial improvements over the current state-of-the-art.Comment: 16 pages, 2 color figure

arXiv.org e-Print Archive

Crossref

A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables

Author: Bellec Pierre
Marrelec Guillaume
Messé Arnaud
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering

arXiv.org e-Print Archive

CiteSeerX

HAL-Inserm

Directory of Open Access Journals

PubMed Central

FigShare