344,795 research outputs found

    Modified Linear Projection for Large Spatial Data Sets

    Full text link
    Recent developments in engineering techniques for spatial data collection such as geographic information systems have resulted in an increasing need for methods to analyze large spatial data sets. These sorts of data sets can be found in various fields of the natural and social sciences. However, model fitting and spatial prediction using these large spatial data sets are impractically time-consuming, because of the necessary matrix inversions. Various methods have been developed to deal with this problem, including a reduced rank approach and a sparse matrix approximation. In this paper, we propose a modification to an existing reduced rank approach to capture both the large- and small-scale spatial variations effectively. We have used simulated examples and an empirical data analysis to demonstrate that our proposed approach consistently performs well when compared with other methods. In particular, the performance of our new method does not depend on the dependence properties of the spatial covariance functions.Comment: 29 pages, 5 figures, 4 table

    A multi-resolution approximation for massive spatial datasets

    Full text link
    Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method.Comment: 23 pages; to be published in Journal of the American Statistical Associatio

    Distributed Dictionary Learning

    Full text link
    The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion center, due to resource limitations, communication overhead or privacy considerations. We develop a general distributed algorithmic framework for the (nonconvex) DL problem and establish its asymptotic convergence. The new method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a gradient tracking mechanism instrumental to locally estimate the missing global information; and ii) a consensus step, as a mechanism to distribute the computations among the agents. To the best of our knowledge, this is the first distributed algorithm with provable convergence for the DL problem and, more in general, bi-convex optimization problems over (time-varying) directed graphs

    Hierarchical Bayesian auto-regressive models for large space time data with applications to ozone concentration modelling

    No full text
    Increasingly large volumes of space-time data are collected everywhere by mobile computing applications, and in many of these cases temporal data are obtained by registering events, for example telecommunication or web traffic data. Having both the spatial and temporal dimensions adds substantial complexity to data analysis and inference tasks. The computational complexity increases rapidly for fitting Bayesian hierarchical models, as such a task involves repeated inversion of large matrices. The primary focus of this paper is on developing space-time auto-regressive models under the hierarchical Bayesian setup. To handle large data sets, a recently developed Gaussian predictive process approximation method (Banerjee et al. [1]) is extended to include auto-regressive terms of latent space-time processes. Specifically, a space-time auto-regressive process, supported on a set of a smaller number of knot locations, is spatially interpolated to approximate the original space-time process. The resulting model is specified within a hierarchical Bayesian framework and Markov chain Monte Carlo techniques are used to make inference. The proposed model is applied for analysing the daily maximum 8-hour average ground level ozone concentration data from 1997 to 2006 from a large study region in the eastern United States. The developed methods allow accurate spatial prediction of a temporally aggregated ozone summary, known as the primary ozone standard, along with its uncertainty, at any unmonitored location during the study period. Trends in spatial patterns of many features of the posterior predictive distribution of the primary standard, such as the probability of non-compliance with respect to the standard, are obtained and illustrated

    Fast joint detection-estimation of evoked brain activity in event-related fMRI using a variational approach

    Get PDF
    In standard clinical within-subject analyses of event-related fMRI data, two steps are usually performed separately: detection of brain activity and estimation of the hemodynamic response. Because these two steps are inherently linked, we adopt the so-called region-based Joint Detection-Estimation (JDE) framework that addresses this joint issue using a multivariate inference for detection and estimation. JDE is built by making use of a regional bilinear generative model of the BOLD response and constraining the parameter estimation by physiological priors using temporal and spatial information in a Markovian modeling. In contrast to previous works that use Markov Chain Monte Carlo (MCMC) techniques to approximate the resulting intractable posterior distribution, we recast the JDE into a missing data framework and derive a Variational Expectation-Maximization (VEM) algorithm for its inference. A variational approximation is used to approximate the Markovian model in the unsupervised spatially adaptive JDE inference, which allows fine automatic tuning of spatial regularisation parameters. It follows a new algorithm that exhibits interesting properties compared to the previously used MCMC-based approach. Experiments on artificial and real data show that VEM-JDE is robust to model mis-specification and provides computational gain while maintaining good performance in terms of activation detection and hemodynamic shape recovery
    • …
    corecore