14,689 research outputs found
High-Dimensional Bayesian Geostatistics
With the growing capabilities of Geographic Information Systems (GIS) and
user-friendly software, statisticians today routinely encounter geographically
referenced data containing observations from a large number of spatial
locations and time points. Over the last decade, hierarchical spatiotemporal
process models have become widely deployed statistical tools for researchers to
better understand the complex nature of spatial and temporal variability.
However, fitting hierarchical spatiotemporal models often involves expensive
matrix computations with complexity increasing in cubic order for the number of
spatial locations and temporal points. This renders such models unfeasible for
large data sets. This article offers a focused review of two methods for
constructing well-defined highly scalable spatiotemporal stochastic processes.
Both these processes can be used as "priors" for spatiotemporal random fields.
The first approach constructs a low-rank process operating on a
lower-dimensional subspace. The second approach constructs a Nearest-Neighbor
Gaussian Process (NNGP) that ensures sparse precision matrices for its finite
realizations. Both processes can be exploited as a scalable prior embedded
within a rich hierarchical modeling framework to deliver full Bayesian
inference. These approaches can be described as model-based solutions for big
spatiotemporal datasets. The models ensure that the algorithmic complexity has
floating point operations (flops), where the number of spatial
locations (per iteration). We compare these methods and provide some insight
into their methodological underpinnings
Bayesian Nonstationary Spatial Modeling for Very Large Datasets
With the proliferation of modern high-resolution measuring instruments
mounted on satellites, planes, ground-based vehicles and monitoring stations, a
need has arisen for statistical methods suitable for the analysis of large
spatial datasets observed on large spatial domains. Statistical analyses of
such datasets provide two main challenges: First, traditional
spatial-statistical techniques are often unable to handle large numbers of
observations in a computationally feasible way. Second, for large and
heterogeneous spatial domains, it is often not appropriate to assume that a
process of interest is stationary over the entire domain.
We address the first challenge by using a model combining a low-rank
component, which allows for flexible modeling of medium-to-long-range
dependence via a set of spatial basis functions, with a tapered remainder
component, which allows for modeling of local dependence using a compactly
supported covariance function. Addressing the second challenge, we propose two
extensions to this model that result in increased flexibility: First, the model
is parameterized based on a nonstationary Matern covariance, where the
parameters vary smoothly across space. Second, in our fully Bayesian model, all
components and parameters are considered random, including the number,
locations, and shapes of the basis functions used in the low-rank component.
Using simulated data and a real-world dataset of high-resolution soil
measurements, we show that both extensions can result in substantial
improvements over the current state-of-the-art.Comment: 16 pages, 2 color figure
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets
Spatial process models for analyzing geostatistical data entail computations
that become prohibitive as the number of spatial locations become large. This
manuscript develops a class of highly scalable Nearest Neighbor Gaussian
Process (NNGP) models to provide fully model-based inference for large
geostatistical datasets. We establish that the NNGP is a well-defined spatial
process providing legitimate finite-dimensional Gaussian densities with sparse
precision matrices. We embed the NNGP as a sparsity-inducing prior within a
rich hierarchical modeling framework and outline how computationally efficient
Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or
decomposing large matrices. The floating point operations (flops) per iteration
of this algorithm is linear in the number of spatial locations, thereby
rendering substantial scalability. We illustrate the computational and
inferential benefits of the NNGP over competing methods using simulation
studies and also analyze forest biomass from a massive United States Forest
Inventory dataset at a scale that precludes alternative dimension-reducing
methods
Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments
With continued advances in Geographic Information Systems and related
computational technologies, statisticians are often required to analyze very
large spatial datasets. This has generated substantial interest over the last
decade, already too vast to be summarized here, in scalable methodologies for
analyzing large spatial datasets. Scalable spatial process models have been
found especially attractive due to their richness and flexibility and,
particularly so in the Bayesian paradigm, due to their presence in hierarchical
model settings. However, the vast majority of research articles present in this
domain have been geared toward innovative theory or more complex model
development. Very limited attention has been accorded to approaches for easily
implementable scalable hierarchical models for the practicing scientist or
spatial analyst. This article is submitted to the Practice section of the
journal with the aim of developing massively scalable Bayesian approaches that
can rapidly deliver Bayesian inference on spatial process that are practically
indistinguishable from inference obtained using more expensive alternatives. A
key emphasis is on implementation within very standard (modest) computing
environments (e.g., a standard desktop or laptop) using easily available
statistical software packages without requiring message-parsing interfaces or
parallel programming paradigms. Key insights are offered regarding assumptions
and approximations concerning practical efficiency.Comment: 20 pages, 4 figures, 2 table
- …