1,344 research outputs found
High-Dimensional Bayesian Geostatistics
With the growing capabilities of Geographic Information Systems (GIS) and
user-friendly software, statisticians today routinely encounter geographically
referenced data containing observations from a large number of spatial
locations and time points. Over the last decade, hierarchical spatiotemporal
process models have become widely deployed statistical tools for researchers to
better understand the complex nature of spatial and temporal variability.
However, fitting hierarchical spatiotemporal models often involves expensive
matrix computations with complexity increasing in cubic order for the number of
spatial locations and temporal points. This renders such models unfeasible for
large data sets. This article offers a focused review of two methods for
constructing well-defined highly scalable spatiotemporal stochastic processes.
Both these processes can be used as "priors" for spatiotemporal random fields.
The first approach constructs a low-rank process operating on a
lower-dimensional subspace. The second approach constructs a Nearest-Neighbor
Gaussian Process (NNGP) that ensures sparse precision matrices for its finite
realizations. Both processes can be exploited as a scalable prior embedded
within a rich hierarchical modeling framework to deliver full Bayesian
inference. These approaches can be described as model-based solutions for big
spatiotemporal datasets. The models ensure that the algorithmic complexity has
floating point operations (flops), where the number of spatial
locations (per iteration). We compare these methods and provide some insight
into their methodological underpinnings
Skellam shrinkage: Wavelet-based intensity estimation for inhomogeneous Poisson data
The ubiquity of integrating detectors in imaging and other applications
implies that a variety of real-world data are well modeled as Poisson random
variables whose means are in turn proportional to an underlying vector-valued
signal of interest. In this article, we first show how the so-called Skellam
distribution arises from the fact that Haar wavelet and filterbank transform
coefficients corresponding to measurements of this type are distributed as sums
and differences of Poisson counts. We then provide two main theorems on Skellam
shrinkage, one showing the near-optimality of shrinkage in the Bayesian setting
and the other providing for unbiased risk estimation in a frequentist context.
These results serve to yield new estimators in the Haar transform domain,
including an unbiased risk estimate for shrinkage of Haar-Fisz
variance-stabilized data, along with accompanying low-complexity algorithms for
inference. We conclude with a simulation study demonstrating the efficacy of
our Skellam shrinkage estimators both for the standard univariate wavelet test
functions as well as a variety of test images taken from the image processing
literature, confirming that they offer substantial performance improvements
over existing alternatives.Comment: 27 pages, 8 figures, slight formatting changes; submitted for
publicatio
Spatial modeling using graphical Markov models and wavelets
Graphical Markov models use graphs to represent possible dependencies among random variables. This class of models is extremely rich and includes inter alia causal Markov models and Markov random fields. In this dissertation, we develop a very efficient optimal-prediction algorithm for graphical Markov models. The algorithm is a generalization of the Kalman-filter algorithm for temporal processes, and it can in principle be applied to any Gaussian undirected graphical model and any Gaussian acyclic directed graphical model;We also propose a new class of multiscale models for stochastic processes in terms of scale-recursive dynamics defined on acyclic directed graphs. The models are an extension of multiscale tree-structured models. The optimal prediction can be obtained using the newly developed generalized Kalman-filter algorithm referred to above, and the parameters can be estimated by maximum likelihood via the EM algorithm. A subclass of these models are multiscale wavelet models, for which we show that the optimal predictors of hidden state variables can be obtained by a level-dependent (scale-dependent) wavelet shrinkage rule;In a series of papers, D. Donoho and I. Johnstone develop wavelet shrinkage methods to solve statistical problems. We propose a new rationale for wavelet shrinkage, based on the assumption that the underlying process can be decomposed into a large-scale deterministic trend plus a small-scale Gaussian process. Our approach has several advantages over current shrinkage methods. It takes the dependencies of empirical wavelet coefficients, both within scales and across scales, into account. Moreover, it does not rely on asymptotic properties for its justification so that it is also appropriate when the sample size is small;Finally, we introduce partially ordered Markov models, which are acyclic directed graphical models for spatial problems. The model can be regarded as a Markov random field with neighborhood structures derivable from an associated partially ordered set. We use a martingale approach to derive the asymptotic properties of maximum (composite) likelihood estimators for partially ordered Markov models. We prove that the maximum (composite) likelihood estimators are consistent, asymptotically normal, and also asymptotically efficient under checkable conditions
Spatial Joint Species Distribution Modeling using Dirichlet Processes
Species distribution models usually attempt to explain presence-absence or
abundance of a species at a site in terms of the environmental features
(socalled abiotic features) present at the site. Historically, such models have
considered species individually. However, it is well-established that species
interact to influence presence-absence and abundance (envisioned as biotic
factors). As a result, there has been substantial recent interest in joint
species distribution models with various types of response, e.g.,
presence-absence, continuous and ordinal data. Such models incorporate
dependence between species response as a surrogate for interaction.
The challenge we focus on here is how to address such modeling in the context
of a large number of species (e.g., order 102) across sites numbering in the
order of 102 or 103 when, in practice, only a few species are found at any
observed site. Again, there is some recent literature to address this; we adopt
a dimension reduction approach. The novel wrinkle we add here is spatial
dependence. That is, we have a collection of sites over a relatively small
spatial region so it is anticipated that species distribution at a given site
would be similar to that at a nearby site. Specifically, we handle dimension
reduction through Dirichlet processes joined with spatial dependence through
Gaussian processes.
We use both simulated data and a plant communities dataset for the Cape
Floristic Region (CFR) of South Africa to demonstrate our approach. The latter
consists of presence-absence measurements for 639 tree species on 662
locations. Through both data examples we are able to demonstrate improved
predictive performance using the foregoing specification
Wavelet regression estimation in nonparametric mixed effect models
AbstractWe show that a nonparametric estimator of a regression function, obtained as solution of a specific regularization problem is the best linear unbiased predictor in some nonparametric mixed effect model. Since this estimator is intractable from a numerical point of view, we propose a tight approximation of it easy and fast to implement. This second estimator achieves the usual optimal rate of convergence of the mean integrated squared error over a Sobolev class both for equispaced and nonequispaced design. Numerical experiments are presented both on simulated and ERP real data
Recommended from our members
Reassessing the Paradigms of Statistical Model-Building
Statistical model-building is the science of constructing models from data and from information about the data-generation process, with the aim of analysing those data and drawing inference from that analysis. Many statistical tasks are undertaken during this analysis; they include classification, forecasting, prediction and testing. Model-building has assumed substantial importance, as new technologies enable data on highly complex phenomena to be gathered in very large quantities. This creates a demand for more complex models, and requires the model-building process itself to be adaptive. The word “paradigm” refers to philosophies, frameworks and methodologies for developing and interpreting statistical models, in the context of data, and applying them for inference. In order to solve contemporary statistical problems it is often necessary to combine techniques from previously separate paradigms. The workshop addressed model-building paradigms that are at the frontiers of modern statistical research. It tried to create synergies, by delineating the connections and collisions among different paradigms. It also endeavoured to shape the future evolution of paradigms
- …