344,795 research outputs found
Modified Linear Projection for Large Spatial Data Sets
Recent developments in engineering techniques for spatial data collection
such as geographic information systems have resulted in an increasing need for
methods to analyze large spatial data sets. These sorts of data sets can be
found in various fields of the natural and social sciences. However, model
fitting and spatial prediction using these large spatial data sets are
impractically time-consuming, because of the necessary matrix inversions.
Various methods have been developed to deal with this problem, including a
reduced rank approach and a sparse matrix approximation. In this paper, we
propose a modification to an existing reduced rank approach to capture both the
large- and small-scale spatial variations effectively. We have used simulated
examples and an empirical data analysis to demonstrate that our proposed
approach consistently performs well when compared with other methods. In
particular, the performance of our new method does not depend on the dependence
properties of the spatial covariance functions.Comment: 29 pages, 5 figures, 4 table
A multi-resolution approximation for massive spatial datasets
Automated sensing instruments on satellites and aircraft have enabled the
collection of massive amounts of high-resolution observations of spatial fields
over large spatial regions. If these datasets can be efficiently exploited,
they can provide new insights on a wide variety of issues. However, traditional
spatial-statistical techniques such as kriging are not computationally feasible
for big datasets. We propose a multi-resolution approximation (M-RA) of
Gaussian processes observed at irregular locations in space. The M-RA process
is specified as a linear combination of basis functions at multiple levels of
spatial resolution, which can capture spatial structure from very fine to very
large scales. The basis functions are automatically chosen to approximate a
given covariance function, which can be nonstationary. All computations
involving the M-RA, including parameter inference and prediction, are highly
scalable for massive datasets. Crucially, the inference algorithms can also be
parallelized to take full advantage of large distributed-memory computing
environments. In comparisons using simulated data and a large satellite
dataset, the M-RA outperforms a related state-of-the-art method.Comment: 23 pages; to be published in Journal of the American Statistical
Associatio
Distributed Dictionary Learning
The paper studies distributed Dictionary Learning (DL) problems where the
learning task is distributed over a multi-agent network with time-varying
(nonsymmetric) connectivity. This formulation is relevant, for instance, in
big-data scenarios where massive amounts of data are collected/stored in
different spatial locations and it is unfeasible to aggregate and/or process
all the data in a fusion center, due to resource limitations, communication
overhead or privacy considerations. We develop a general distributed
algorithmic framework for the (nonconvex) DL problem and establish its
asymptotic convergence. The new method hinges on Successive Convex
Approximation (SCA) techniques coupled with i) a gradient tracking mechanism
instrumental to locally estimate the missing global information; and ii) a
consensus step, as a mechanism to distribute the computations among the agents.
To the best of our knowledge, this is the first distributed algorithm with
provable convergence for the DL problem and, more in general, bi-convex
optimization problems over (time-varying) directed graphs
Hierarchical Bayesian auto-regressive models for large space time data with applications to ozone concentration modelling
Increasingly large volumes of space-time data are collected everywhere by mobile computing applications, and in many of these cases temporal data are obtained by registering events, for example telecommunication or web traffic data. Having both the spatial and temporal dimensions adds substantial complexity to data analysis and inference tasks. The computational complexity increases rapidly for fitting Bayesian hierarchical models, as such a task involves repeated inversion of large matrices. The primary focus of this paper is on developing space-time auto-regressive models under the hierarchical Bayesian setup. To handle large data sets, a recently developed Gaussian predictive process approximation method (Banerjee et al. [1]) is extended to include auto-regressive terms of latent space-time processes. Specifically, a space-time auto-regressive process, supported on a set of a smaller number of knot locations, is spatially interpolated to approximate the original space-time process. The resulting model is specified within a hierarchical Bayesian framework and Markov chain Monte Carlo techniques are used to make inference. The proposed model is applied for analysing the daily maximum 8-hour average ground level ozone concentration data from 1997 to 2006 from a large study region in the eastern United States. The developed methods allow accurate spatial prediction of a temporally aggregated ozone summary, known as the primary ozone standard, along with its uncertainty, at any unmonitored location during the study period. Trends in spatial patterns of many features of the posterior predictive distribution of the primary standard, such as the probability of non-compliance with respect to the standard, are obtained and illustrated
Fast joint detection-estimation of evoked brain activity in event-related fMRI using a variational approach
In standard clinical within-subject analyses of event-related fMRI data, two
steps are usually performed separately: detection of brain activity and
estimation of the hemodynamic response. Because these two steps are inherently
linked, we adopt the so-called region-based Joint Detection-Estimation (JDE)
framework that addresses this joint issue using a multivariate inference for
detection and estimation. JDE is built by making use of a regional bilinear
generative model of the BOLD response and constraining the parameter estimation
by physiological priors using temporal and spatial information in a Markovian
modeling. In contrast to previous works that use Markov Chain Monte Carlo
(MCMC) techniques to approximate the resulting intractable posterior
distribution, we recast the JDE into a missing data framework and derive a
Variational Expectation-Maximization (VEM) algorithm for its inference. A
variational approximation is used to approximate the Markovian model in the
unsupervised spatially adaptive JDE inference, which allows fine automatic
tuning of spatial regularisation parameters. It follows a new algorithm that
exhibits interesting properties compared to the previously used MCMC-based
approach. Experiments on artificial and real data show that VEM-JDE is robust
to model mis-specification and provides computational gain while maintaining
good performance in terms of activation detection and hemodynamic shape
recovery
- …