4,207 research outputs found
Large-scale linear regression: Development of high-performance routines
In statistics, series of ordinary least squares problems (OLS) are used to
study the linear correlation among sets of variables of interest; in many
studies, the number of such variables is at least in the millions, and the
corresponding datasets occupy terabytes of disk space. As the availability of
large-scale datasets increases regularly, so does the challenge in dealing with
them. Indeed, traditional solvers---which rely on the use of black-box"
routines optimized for one single OLS---are highly inefficient and fail to
provide a viable solution for big-data analyses. As a case study, in this paper
we consider a linear regression consisting of two-dimensional grids of related
OLS problems that arise in the context of genome-wide association analyses, and
give a careful walkthrough for the development of {\sc ols-grid}, a
high-performance routine for shared-memory architectures; analogous steps are
relevant for tailoring OLS solvers to other applications. In particular, we
first illustrate the design of efficient algorithms that exploit the structure
of the OLS problems and eliminate redundant computations; then, we show how to
effectively deal with datasets that do not fit in main memory; finally, we
discuss how to cast the computation in terms of efficient kernels and how to
achieve scalability. Importantly, each design decision along the way is
justified by simple performance models. {\sc ols-grid} enables the solution of
correlated OLS problems operating on terabytes of data in a matter of
hours
Dynamic Bayesian Predictive Synthesis in Time Series Forecasting
We discuss model and forecast combination in time series forecasting. A
foundational Bayesian perspective based on agent opinion analysis theory
defines a new framework for density forecast combination, and encompasses
several existing forecast pooling methods. We develop a novel class of dynamic
latent factor models for time series forecast synthesis; simulation-based
computation enables implementation. These models can dynamically adapt to
time-varying biases, miscalibration and inter-dependencies among multiple
models or forecasters. A macroeconomic forecasting study highlights the dynamic
relationships among synthesized forecast densities, as well as the potential
for improved forecast accuracy at multiple horizons
Auxiliary Likelihood-Based Approximate Bayesian Computation in State Space Models
A computationally simple approach to inference in state space models is
proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation
of an intractable likelihood by matching summary statistics for the observed
data with statistics computed from data simulated from the true process, based
on parameter draws from the prior. Draws that produce a 'match' between
observed and simulated summaries are retained, and used to estimate the
inaccessible posterior. With no reduction to a low-dimensional set of
sufficient statistics being possible in the state space setting, we define the
summaries as the maximum of an auxiliary likelihood function, and thereby
exploit the asymptotic sufficiency of this estimator for the auxiliary
parameter vector. We derive conditions under which this approach - including a
computationally efficient version based on the auxiliary score - achieves
Bayesian consistency. To reduce the well-documented inaccuracy of ABC in
multi-parameter settings, we propose the separate treatment of each parameter
dimension using an integrated likelihood technique. Three stochastic volatility
models for which exact Bayesian inference is either computationally
challenging, or infeasible, are used for illustration. We demonstrate that our
approach compares favorably against an extensive set of approximate and exact
comparators. An empirical illustration completes the paper.Comment: This paper is forthcoming at the Journal of Computational and
Graphical Statistics. It also supersedes the earlier arXiv paper "Approximate
Bayesian Computation in State Space Models" (arXiv:1409.8363
Parallel Sequential Monte Carlo for Efficient Density Combination: The DeCo MATLAB Toolbox
This paper presents the Matlab package DeCo (Density Combination) which is based on the paper by Billio et al. (2013) where a constructive Bayesian approach is presented for combining predictive densities originating from different models or other sources of information. The combination weights are time-varying and may depend on past predictive forecasting performances and other learning mechanisms. The core algorithm is the function DeCo which applies banks of parallel Sequential Monte Carlo algorithms to filter the time-varying combination weights. The DeCo procedure has been implemented both for standard CPU computing and for Graphical Process Unit (GPU) parallel computing. For the GPU implementation we use the Matlab parallel computing toolbox and show how to use General Purposes GPU computing almost effortless. This GPU implementation comes with a speed up of the execution time up to seventy times compared to a standard CPU Matlab implementation on a multicore CPU. We show the use of the package and the computational gain of the GPU version, through some simulation experiments and empirical application
Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models
This tutorial provides a gentle introduction to the particle
Metropolis-Hastings (PMH) algorithm for parameter inference in nonlinear
state-space models together with a software implementation in the statistical
programming language R. We employ a step-by-step approach to develop an
implementation of the PMH algorithm (and the particle filter within) together
with the reader. This final implementation is also available as the package
pmhtutorial in the CRAN repository. Throughout the tutorial, we provide some
intuition as to how the algorithm operates and discuss some solutions to
problems that might occur in practice. To illustrate the use of PMH, we
consider parameter inference in a linear Gaussian state-space model with
synthetic data and a nonlinear stochastic volatility model with real-world
data.Comment: 41 pages, 7 figures. In press for Journal of Statistical Software.
Source code for R, Python and MATLAB available at:
https://github.com/compops/pmh-tutoria
- …