13 research outputs found
Speeding up neighborhood search in local Gaussian process prediction
Recent implementations of local approximate Gaussian process models have
pushed computational boundaries for non-linear, non-parametric prediction
problems, particularly when deployed as emulators for computer experiments.
Their flavor of spatially independent computation accommodates massive
parallelization, meaning that they can handle designs two or more orders of
magnitude larger than previously. However, accomplishing that feat can still
require massive supercomputing resources. Here we aim to ease that burden. We
study how predictive variance is reduced as local designs are built up for
prediction. We then observe how the exhaustive and discrete nature of an
important search subroutine involved in building such local designs may be
overly conservative. Rather, we suggest that searching the space radially,
i.e., continuously along rays emanating from the predictive location of
interest, is a far thriftier alternative. Our empirical work demonstrates that
ray-based search yields predictors with accuracy comparable to exhaustive
search, but in a fraction of the time - bringing a supercomputer implementation
back onto the desktop.Comment: 24 pages, 5 figures, 4 table
Gaussian Process Regression for Prediction of Sulfate Content in Lakes of China
In recent years, environmental pollution has become more and more serious, especially water pollution. In this study, the method of Gaussian process regression was used to build a prediction model for the sulphate content of lakes using several water quality variables as inputs. The sulphate content and other variable water quality data from 100 stations operated at lakes along the middle and lower reaches of the Yangtze River were used for developing the four models. The selected water quality data, consisting of water temperature, transparency, pH, dissolved oxygen conductivity, chlorophyll, total phosphorus, total nitrogen and ammonia nitrogen, were used as inputs for several different Gaussian process regression models. The experimental results showed that the Gaussian process regression model using an exponential kernel had the smallest prediction error. Its mean absolute error (MAE) of 5.0464 and root mean squared error (RMSE) of 7.269 were smaller than those of the other three Gaussian process regression models. By contrast, in the experiment, the model used in this study had a smaller error than linear regression, decision tree, support vector regression, Boosting trees, Bagging trees and other models, making it more suitable for prediction of the sulphate content in lakes. The method proposed in this paper can effectively predict the sulphate content in water, providing a new kind of auxiliary method for water detection
High-Dimensional Bayesian Geostatistics
With the growing capabilities of Geographic Information Systems (GIS) and
user-friendly software, statisticians today routinely encounter geographically
referenced data containing observations from a large number of spatial
locations and time points. Over the last decade, hierarchical spatiotemporal
process models have become widely deployed statistical tools for researchers to
better understand the complex nature of spatial and temporal variability.
However, fitting hierarchical spatiotemporal models often involves expensive
matrix computations with complexity increasing in cubic order for the number of
spatial locations and temporal points. This renders such models unfeasible for
large data sets. This article offers a focused review of two methods for
constructing well-defined highly scalable spatiotemporal stochastic processes.
Both these processes can be used as "priors" for spatiotemporal random fields.
The first approach constructs a low-rank process operating on a
lower-dimensional subspace. The second approach constructs a Nearest-Neighbor
Gaussian Process (NNGP) that ensures sparse precision matrices for its finite
realizations. Both processes can be exploited as a scalable prior embedded
within a rich hierarchical modeling framework to deliver full Bayesian
inference. These approaches can be described as model-based solutions for big
spatiotemporal datasets. The models ensure that the algorithmic complexity has
floating point operations (flops), where the number of spatial
locations (per iteration). We compare these methods and provide some insight
into their methodological underpinnings
Bottom-up estimation and top-down prediction: Solar energy prediction combining information from multiple sources
Accurately forecasting solar power using the data from multiple sources is an important but challenging problem. Our goal is to combine two different physics model forecasting outputs with real measurements from an automated monitoring network so as to better predict solar power in a timely manner. To this end, we propose a new approach of analyzing large-scale multilevel models with great computational efficiency requiring minimum monitoring and intervention. This approach features a division of the large scale data set into smaller ones with manageable sizes, based on their physical locations, and fit a local model in each area. The local model estimates are then combined sequentially from the specified multilevel models using our novel bottom-up approach for parameter estimation. The prediction, on the other hand, is implemented in a top-down matter. The proposed method is applied to the solar energy prediction problem for the U.S. Department of Energy’s SunShot Initiative
Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments
The Gaussian process is a standard tool for building emulators for both
deterministic and stochastic computer experiments. However, application of
Gaussian process models is greatly limited in practice, particularly for
large-scale and many-input computer experiments that have become typical. We
propose a multi-resolution functional ANOVA model as a computationally feasible
emulation alternative. More generally, this model can be used for large-scale
and many-input non-linear regression problems. An overlapping group lasso
approach is used for estimation, ensuring computational feasibility in a
large-scale and many-input setting. New results on consistency and inference
for the (potentially overlapping) group lasso in a high-dimensional setting are
developed and applied to the proposed multi-resolution functional ANOVA model.
Importantly, these results allow us to quantify the uncertainty in our
predictions. Numerical examples demonstrate that the proposed model enjoys
marked computational advantages. Data capabilities, both in terms of sample
size and dimension, meet or exceed best available emulation tools while meeting
or exceeding emulation accuracy
Bayesian Anal
With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as "priors" for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has ~ | floating point operations (flops), where | the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings.R01 ES027027/ES/NIEHS NIH HHS/United StatesR01 OH010093/OH/NIOSH CDC HHS/United StatesRC1 GM092400/GM/NIGMS NIH HHS/United States2018-01-30T00:00:00Z29391920PMC5790125vault:2616