Search CORE

2,385 research outputs found

Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments

Author: Cressie N.
Golub G. H.
Lauritzen S. L.
Murphy K.
Stein M. L.
Vecchia A. V.
Yeniay O.
Publication venue: 'Wiley'
Publication date: 07/11/2018
Field of study

With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial datasets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article is submitted to the Practice section of the journal with the aim of developing massively scalable Bayesian approaches that can rapidly deliver Bayesian inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (e.g., a standard desktop or laptop) using easily available statistical software packages without requiring message-parsing interfaces or parallel programming paradigms. Key insights are offered regarding assumptions and approximations concerning practical efficiency.Comment: 20 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

High-Dimensional Bayesian Geostatistics

Author: Banerjee Sudipto
Publication venue
Publication date: 01/01/2017
Field of study

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as "priors" for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has

\sim n

floating point operations (flops), where

n

the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings

arXiv.org e-Print Archive

Ezid

Crossref

eScholarship - University of California

Recommended from our members

On Simplified Bayesian Modeling for Massive Geostatistical Datasets: Conjugacy and Beyond

Author: Zhang Lu
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

With continued advances in Geographic Information Systems and related computational technologies, researchers in diverse fields like forestry, environmental health, climate sciences etc. have growing interests in analyzing large scale data sets measured at a substantial number of geographic locations. Geostatistical models used to capture the space varying relationships in such data are often accompanied by onerous computations which prohibit the analysis of large scale spatial data sets. Less burdensome alternatives proposed recently for analyzing massive spatial datasets often lead to inaccurate inference or require slow sampling process. Bayesian inference, while attractive for accommodating uncertainties through their hierarchical structures, can become computationally onerous for modeling massive spatial data sets because of their reliance on iterative estimation algorithms. My dissertation research aims at developing computationally scalable Bayesian geostatistical models that provide valid inference through highly accelerated sampling process. We also study the asymptotic properties of estimators in spatial analysis.In Chapter 2 and 3, we develop conjugate Bayesian frameworks for analyzing univariate and multivariate spatial data. We propose a conjugate latent Nearest-Neighbor Gaussian Process (NNGP) model in Chapter 2, which uses analytically tractable posterior distributions to obtain posterior inferences, including the large dimensional latent process. In Chapter 3, we focus on building conjugate Bayesian frameworks for analyzing multivariate spatial data. We utilize Matrix-Normal Inverse-Wishart(MNIW) prior to propose conjugate Bayesian frameworks and algorithms that can incorporate a family of scalable spatial modeling methodologies.In Chapter 4, we pursue general Bayesian modeling methodologies beyond a conjugate Bayesian hierarchical modeling. We build scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models, and propose a highly accelerated block update MCMC algorithm. Using the proposed Bayesian LMC model, we extend scalable modeling strategies for a single process into multivariate process cases. All proposed frameworks are tested on simulated data and fit to real data sets with observed locations numbering in the millions. Our contribution is to offer practicing scientists and spatial analysts practical and flexible scalable hierarchical models for analyzing massive spatial data sets.In Chapter 5, we investigate the asymptotic properties of the estimators in spatial analysis. We formally establish results on the identifiability and consistency of the nugget in spatial models based upon the Gaussian process within the framework of in-fill asymptotics, i.e. the sample size increases within a sampling domain that is bounded. We establish the identifiability of parameters in the Matern covariance function and the consistency of their maximum likelihood estimators in the presence of discontinuities due to the nugget

eScholarship - University of California

Bayesian Anal

Author
Publication venue
Publication date
Field of study

CDC Stacks

Research and Education in Computational Science and Engineering

Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

Author: Banerjee Sudipto
Datta Abhirup
Finley Andrew O.
Gelfand Alan E.
Publication venue
Publication date: 01/01/2014
Field of study

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This manuscript develops a class of highly scalable Nearest Neighbor Gaussian Process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive United States Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

FigShare

A Survey of Bayesian Statistical Approaches for Big Data

Author: A Akusok
A Baldominos
A Belle
A Beskos
A Bouchard-Côté
A De Mauro
A Fahad
A Gandomi
A Lee
A Lee
A Marshall
A O’Driscoll
A Siddiqa
A Vyas
AB Owen
AF Wise
AR Linero
AT Azar
AT Porter
AT Porter
AÇ Pehlivanlı
B Franke
B Liquet
B Liu
B Oancea
C Loebbecke
C Wang
C Wang
C Yang
CA McGrory
CC Drovandi
CE Rasmussen
Changwon Yoo
CK Emani
D Apiletti
D Oprea
D Talia
DB Dunson
DM Blei
DN Politis
DT Frazier
DV Shah
DW Bates
E Raguseo
ED Schifano
ET Bradlow
F Lindsten
Florian Buettner
Florian Maire
G Bello-Orgaz
G Jifa
GI Allen
GJ Lasinio
GM Allenby
H Cai
H Demirkan
H Hassani
H Kousar
HA Chipman
HH Huang
HJ Watson
I Ben-Gal
J Fan
J Roski
J Zhu
Jake Luo
JE Bibault
JJ Chen
JN Cappella
JS Rumsfeld
K Chalupka
Kath Albury
KL Mengersen
KS Divya
L Breiman
L Liu
L Mählmann
L Wang
L Yu
L Zhang
L Zhou
LG Nongxa
M Hilbert
M Viceconti
MA Suchard
Matias Quiroz
MD Assunção
MD Hoffman
MT Moores
N Moustafa
N. Chopin
NA Lazar
O Sysoev
Oliver Müller
OY Al-Jarrah
P Ducange
P Müller
P Pudlo
PF Brennan
R Bardenet
R Burrows
R Guhaniyogi
R Guhaniyogi
R Guhaniyogi
R Izbicki
RF Mansour
Richard Branch
Robin Genuer
RW Hoerl
S Atkinson
S Castruccio
S Chaudhuri
S Fosso Wamba
S Guha
S Kaisler
S Li
S Minsker
S Pandey
S Sagiroglu
S Sisson
S Srivastava
S Suthaharan
S White
SF Wamba
Shahriar Akter
Shweta Bansal
Simon I. Hay
SL Scott
SM Schennach
Sudipto Banerjee
T Magdon-Ismail
T Zhang
Tengyao Wang
TH McCormick
TJ McKinley
U Sivarajah
VD Katkar
X Zhang
XF Wang
XG Xia
Xing Ju Lee
Y Tang
Y Webb-Vargas
Y Zhang
Yang Ni
YW Teh
Z Ma
Z Sun
Z Zhang
Ziad Obermeyer
Zoubin Ghahramani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/05/2020
Field of study

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive