8,911 research outputs found
Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception
This paper presents novel Gaussian process decentralized data fusion
algorithms exploiting the notion of agent-centric support sets for distributed
cooperative perception of large-scale environmental phenomena. To overcome the
limitations of scale in existing works, our proposed algorithms allow every
mobile sensing agent to choose a different support set and dynamically switch
to another during execution for encapsulating its own data into a local summary
that, perhaps surprisingly, can still be assimilated with the other agents'
local summaries (i.e., based on their current choices of support sets) into a
globally consistent summary to be used for predicting the phenomenon. To
achieve this, we propose a novel transfer learning mechanism for a team of
agents capable of sharing and transferring information encapsulated in a
summary based on a support set to that utilizing a different support set with
some loss that can be theoretically bounded and analyzed. To alleviate the
issue of information loss accumulating over multiple instances of transfer
learning, we propose a new information sharing mechanism to be incorporated
into our algorithms in order to achieve memory-efficient lazy transfer
learning. Empirical evaluation on real-world datasets show that our algorithms
outperform the state-of-the-art methods.Comment: 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), Extended
version with proofs, 14 page
Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments
With continued advances in Geographic Information Systems and related
computational technologies, statisticians are often required to analyze very
large spatial datasets. This has generated substantial interest over the last
decade, already too vast to be summarized here, in scalable methodologies for
analyzing large spatial datasets. Scalable spatial process models have been
found especially attractive due to their richness and flexibility and,
particularly so in the Bayesian paradigm, due to their presence in hierarchical
model settings. However, the vast majority of research articles present in this
domain have been geared toward innovative theory or more complex model
development. Very limited attention has been accorded to approaches for easily
implementable scalable hierarchical models for the practicing scientist or
spatial analyst. This article is submitted to the Practice section of the
journal with the aim of developing massively scalable Bayesian approaches that
can rapidly deliver Bayesian inference on spatial process that are practically
indistinguishable from inference obtained using more expensive alternatives. A
key emphasis is on implementation within very standard (modest) computing
environments (e.g., a standard desktop or laptop) using easily available
statistical software packages without requiring message-parsing interfaces or
parallel programming paradigms. Key insights are offered regarding assumptions
and approximations concerning practical efficiency.Comment: 20 pages, 4 figures, 2 table
A multi-resolution approximation for massive spatial datasets
Automated sensing instruments on satellites and aircraft have enabled the
collection of massive amounts of high-resolution observations of spatial fields
over large spatial regions. If these datasets can be efficiently exploited,
they can provide new insights on a wide variety of issues. However, traditional
spatial-statistical techniques such as kriging are not computationally feasible
for big datasets. We propose a multi-resolution approximation (M-RA) of
Gaussian processes observed at irregular locations in space. The M-RA process
is specified as a linear combination of basis functions at multiple levels of
spatial resolution, which can capture spatial structure from very fine to very
large scales. The basis functions are automatically chosen to approximate a
given covariance function, which can be nonstationary. All computations
involving the M-RA, including parameter inference and prediction, are highly
scalable for massive datasets. Crucially, the inference algorithms can also be
parallelized to take full advantage of large distributed-memory computing
environments. In comparisons using simulated data and a large satellite
dataset, the M-RA outperforms a related state-of-the-art method.Comment: 23 pages; to be published in Journal of the American Statistical
Associatio
High-Dimensional Bayesian Geostatistics
With the growing capabilities of Geographic Information Systems (GIS) and
user-friendly software, statisticians today routinely encounter geographically
referenced data containing observations from a large number of spatial
locations and time points. Over the last decade, hierarchical spatiotemporal
process models have become widely deployed statistical tools for researchers to
better understand the complex nature of spatial and temporal variability.
However, fitting hierarchical spatiotemporal models often involves expensive
matrix computations with complexity increasing in cubic order for the number of
spatial locations and temporal points. This renders such models unfeasible for
large data sets. This article offers a focused review of two methods for
constructing well-defined highly scalable spatiotemporal stochastic processes.
Both these processes can be used as "priors" for spatiotemporal random fields.
The first approach constructs a low-rank process operating on a
lower-dimensional subspace. The second approach constructs a Nearest-Neighbor
Gaussian Process (NNGP) that ensures sparse precision matrices for its finite
realizations. Both processes can be exploited as a scalable prior embedded
within a rich hierarchical modeling framework to deliver full Bayesian
inference. These approaches can be described as model-based solutions for big
spatiotemporal datasets. The models ensure that the algorithmic complexity has
floating point operations (flops), where the number of spatial
locations (per iteration). We compare these methods and provide some insight
into their methodological underpinnings
- …