18 research outputs found
Recent Developments in Complex and Spatially Correlated Functional Data
As high-dimensional and high-frequency data are being collected on a large
scale, the development of new statistical models is being pushed forward.
Functional data analysis provides the required statistical methods to deal with
large-scale and complex data by assuming that data are continuous functions,
e.g., a realization of a continuous process (curves) or continuous random
fields (surfaces), and that each curve or surface is considered as a single
observation. Here, we provide an overview of functional data analysis when data
are complex and spatially correlated. We provide definitions and estimators of
the first and second moments of the corresponding functional random variable.
We present two main approaches: The first assumes that data are realizations of
a functional random field, i.e., each observation is a curve with a spatial
component. We call them 'spatial functional data'. The second approach assumes
that data are continuous deterministic fields observed over time. In this case,
one observation is a surface or manifold, and we call them 'surface time
series'. For the two approaches, we describe software available for the
statistical analysis. We also present a data illustration, using a
high-resolution wind speed simulated dataset, as an example of the two
approaches. The functional data approach offers a new paradigm of data
analysis, where the continuous processes or random fields are considered as a
single entity. We consider this approach to be very valuable in the context of
big data.Comment: Some typos fixed and new references adde
Positive definite nonparametric regression using an evolutionary algorithm with application to covariance function estimation
We propose a novel nonparametric regression framework subject to the positive
definiteness constraint. It offers a highly modular approach for estimating
covariance functions of stationary processes. Our method can impose positive
definiteness, as well as isotropy and monotonicity, on the estimators, and its
hyperparameters can be decided using cross validation. We define our estimators
by taking integral transforms of kernel-based distribution surrogates. We then
use the iterated density estimation evolutionary algorithm, a variant of
estimation of distribution algorithms, to fit the estimators. We also extend
our method to estimate covariance functions for point-referenced data. Compared
to alternative approaches, our method provides more reliable estimates for
long-range dependence. Several numerical studies are performed to demonstrate
the efficacy and performance of our method. Also, we illustrate our method
using precipitation data from the Spatial Interpolation Comparison 97 project.Comment: Accepted at the 2023 Genetic and Evolutionary Computation Conference
(GECCO) as a full paper. 14 pages with references and appendices, 11 figure
Bootstrap based uncertainty bands for prediction in functional kriging
The increasing interest in spatially correlated functional data has led to
the development of appropriate geostatistical techniques that allow to predict
a curve at an unmonitored location using a functional kriging with external
drift model that takes into account the effect of exogenous variables (either
scalar or functional). Nevertheless uncertainty evaluation for functional
spatial prediction remains an open issue. We propose a semi-parametric
bootstrap for spatially correlated functional data that allows to evaluate the
uncertainty of a predicted curve, ensuring that the spatial dependence
structure is maintained in the bootstrap samples. The performance of the
proposed methodology is assessed via a simulation study. Moreover, the approach
is illustrated on a well known data set of Canadian temperature and on a real
data set of PM concentration in the Piemonte region, Italy. Based on the
results it can be concluded that the method is computationally feasible and
suitable for quantifying the uncertainty around a predicted curve.
Supplementary material including R code is available upon request
Recommended from our members
On Simplified Bayesian Modeling for Massive Geostatistical Datasets: Conjugacy and Beyond
With continued advances in Geographic Information Systems and related computational technologies, researchers in diverse fields like forestry, environmental health, climate sciences etc. have growing interests in analyzing large scale data sets measured at a substantial number of geographic locations. Geostatistical models used to capture the space varying relationships in such data are often accompanied by onerous computations which prohibit the analysis of large scale spatial data sets. Less burdensome alternatives proposed recently for analyzing massive spatial datasets often lead to inaccurate inference or require slow sampling process. Bayesian inference, while attractive for accommodating uncertainties through their hierarchical structures, can become computationally onerous for modeling massive spatial data sets because of their reliance on iterative estimation algorithms. My dissertation research aims at developing computationally scalable Bayesian geostatistical models that provide valid inference through highly accelerated sampling process. We also study the asymptotic properties of estimators in spatial analysis.In Chapter 2 and 3, we develop conjugate Bayesian frameworks for analyzing univariate and multivariate spatial data. We propose a conjugate latent Nearest-Neighbor Gaussian Process (NNGP) model in Chapter 2, which uses analytically tractable posterior distributions to obtain posterior inferences, including the large dimensional latent process. In Chapter 3, we focus on building conjugate Bayesian frameworks for analyzing multivariate spatial data. We utilize Matrix-Normal Inverse-Wishart(MNIW) prior to propose conjugate Bayesian frameworks and algorithms that can incorporate a family of scalable spatial modeling methodologies.In Chapter 4, we pursue general Bayesian modeling methodologies beyond a conjugate Bayesian hierarchical modeling. We build scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models, and propose a highly accelerated block update MCMC algorithm. Using the proposed Bayesian LMC model, we extend scalable modeling strategies for a single process into multivariate process cases. All proposed frameworks are tested on simulated data and fit to real data sets with observed locations numbering in the millions. Our contribution is to offer practicing scientists and spatial analysts practical and flexible scalable hierarchical models for analyzing massive spatial data sets.In Chapter 5, we investigate the asymptotic properties of the estimators in spatial analysis. We formally establish results on the identifiability and consistency of the nugget in spatial models based upon the Gaussian process within the framework of in-fill asymptotics, i.e. the sample size increases within a sampling domain that is bounded. We establish the identifiability of parameters in the Matern covariance function and the consistency of their maximum likelihood estimators in the presence of discontinuities due to the nugget
Linear and non-linear resource estimation techniques applied in the Kanzi Phosphate project in the Democratic Republic of Congo
The main aim of this project was to conduct a comparative analysis of the linear and non-linear
estimation techniques used for a Kanzi Phosphate Project in the Democratic Republic of Congo.
Kanzi phosphate is an elongated sedimentary unit with a north-south strike direction and a fairly
flat dip angle. It was deposited between two graben structures.
The Kanzi phosphate was divided into the North and South areas. The North and South areas
were treated as different domains because they are far apart. The geology and assay results of
the intersected phosphate mineralization were used in defining the layers. The layering was
noted in South Geo-Zone. This led the South Geo-Zone to be sub-divided vertically into three
layers namely Top, Middle and Bottom layers. The Top and Bottom layers had low P2O5 grades
and higher SiO2 than the Middle layer. The Middle layer was the most laterally extensive layer
than other layers.
Drillholes were done by the Aircore drilling technique and the samples were taken at 1m
intervals. No compositing was done as all samples contributed equal statistical weights in terms
of length and density measurements. The declustering was not done because the drillholes were
well-spread.
The statistical evaluation of the domains showed that P2O5 is correlated to all other major
variables (CaO, Al2O3, TiO2 and SiO2). A decision was taken to conduct mineral resource
estimation on P2O5 only. Other block variables were estimated from the P2O5 using a linear
regression relationship.
A 3-dimensional geological model was constructed for each domain. A model was filled with the
blocks. A definition of the block sizes were based on the neighbourhood analysis, drillhole
spacing and mining requirements. Half the drillhole spacing was used for X (125m) and Y
(125m) dimensions and 5m thickness was used for Z dimension.
The traditional variograms for all the domains were created. Downhole variograms were used to
determine the nugget effect. All variograms were omni-directional and have spherical models.
The variogram ranges were used to guide the search volumes for both Ordinary Kriging (OK) and
Inverse Distance Weighting (IDW). The estimation results from the OK and IDW techniques
were comparable.
The data was pre-processed for Indicator Kriging (IK). The median cut-offs were selected and
median variograms were calculated. It was assumed that all other indicators have similar
variograms to that of the median indicator variogram. For estimation purpose, the cut-offs
selected were 7.5%, 12.5%, 17.5%, 22.5% and 27.5%. These cut-offs were guided by
processing characteristics on the Kanzi phosphate.
The results of the three estimation techniques (IDW, OK and IK) were analysed. The OK and
IDW methods produced smoothed estimates. The OK and IDW methods defined the global
resources well. The measure of uncertainty for OK was not clearly defined, partly due to widely
spaced data.
The Median Indicator Kriging produced more useful results than the results produced by the OK
and IDW methods and smoothing was minimized. As a probabilistic method, the Median
Indicator Kriging defined the proportion of tonnages above the defined processing cut-offs.
The estimation methods were compared and ranked. The Median Indicator Kriging was the
preferred estimation technique and was ranked high. The OK and IDW produced identical
results and they were ranked low. OK performed like IDW as there were moderately mixed
sample populations that were spatially integrated.
The recommendations to conduct conditional simulation, drill additional boreholes, estimate other
variables using co-kriging and perform further processing studies were given. This will help in
reducing risks and increase the geostatistical understanding of the phosphate resources