Search CORE

490 research outputs found

A Bayesian Multivariate Functional Dynamic Linear Model

Author: Kowal Daniel R.
Matteson David S.
Ruppert David
Publication venue: 'Informa UK Limited'
Publication date: 05/08/2015
Field of study

We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data--functional, time dependent, and multivariate components--we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time-frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online

arXiv.org e-Print Archive

FigShare

RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections

Author: Li Ping
Ruppert David
Srivastava Radhendushka
Publication venue
Publication date: 07/05/2014
Field of study

In high dimensions, the classical Hotelling's

T^2

test tends to have low power or becomes undefined due to singularity of the sample covariance matrix. In this paper, this problem is overcome by projecting the data matrix onto lower dimensional subspaces through multiplication by random matrices. We propose RAPTT (RAndom Projection T-Test), an exact test for equality of means of two normal populations based on projected lower dimensional data. RAPTT does not require any constraints on the dimension of the data or the sample size. A simulation study indicates that in high dimensions the power of this test is often greater than that of competing tests. The advantage of RAPTT is illustrated on high-dimensional gene expression data involving the discrimination of tumor and normal colon tissues

arXiv.org e-Print Archive

FigShare

Multilevel Bayesian framework for modeling the production, propagation and detection of ultra-high energy cosmic rays

Author: Chernoff David
Loredo Thomas
Ruppert David
Soiaporn Kunlaya
Wasserman Ira
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/11/2013
Field of study

Ultra-high energy cosmic rays (UHECRs) are atomic nuclei with energies over ten million times energies accessible to human-made particle accelerators. Evidence suggests that they originate from relatively nearby extragalactic sources, but the nature of the sources is unknown. We develop a multilevel Bayesian framework for assessing association of UHECRs and candidate source populations, and Markov chain Monte Carlo algorithms for estimating model parameters and comparing models by computing, via Chib's method, marginal likelihoods and Bayes factors. We demonstrate the framework by analyzing measurements of 69 UHECRs observed by the Pierre Auger Observatory (PAO) from 2004-2009, using a volume-complete catalog of 17 local active galactic nuclei (AGN) out to 15 megaparsecs as candidate sources. An early portion of the data ("period 1," with 14 events) was used by PAO to set an energy cut maximizing the anisotropy in period 1; the 69 measurements include this "tuned" subset, and subsequent "untuned" events with energies above the same cutoff. Also, measurement errors are approximately summarized. These factors are problematic for independent analyses of PAO data. Within the context of "standard candle" source models (i.e., with a common isotropic emission rate), and considering only the 55 untuned events, there is no significant evidence favoring association of UHECRs with local AGN vs. an isotropic background. The highest-probability associations are with the two nearest, adjacent AGN, Centaurus A and NGC 4945. If the association model is adopted, the fraction of UHECRs that may be associated is likely nonzero but is well below 50%. Our framework enables estimation of the angular scale for deflection of cosmic rays by cosmic magnetic fields; relatively modest scales of

\approx\!3^{\circ}

30^{\circ}

are favored. Models that assign a large fraction of UHECRs to a single nearby source (e.g., Centaurus A) are ruled out unless very large deflection scales are specified a priori, and even then they are disfavored. However, including the period 1 data alters the conclusions significantly, and a simulation study supports the idea that the period 1 data are anomalous, presumably due to the tuning. Accurate and optimal analysis of future data will likely require more complete disclosure of the data.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS654 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Fast Covariance Estimation for High-dimensional Functional Data

Author: Crainiceanu Ciprian
Ruppert David
Xiao Luo
Zipunnikov Vadim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/06/2013
Field of study

For smoothing covariance functions, we propose two fast algorithms that scale linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension

J \times J

with

J>500

; the recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions such as

J \ge 10,000

. Covariance matrices of order

J=10,000

, and even

J=100,000

, are becoming increasingly common, e.g., in 2- and 3-dimensional medical imaging and high-density wearable sensor data. We introduce two new algorithms that can handle very large covariance matrices: 1) FACE: a fast implementation of the sandwich smoother and 2) SVDS: a two-step procedure that first applies singular value decomposition to the data matrix and then smoothes the eigenvectors. Compared to existing techniques, these new algorithms are at least an order of magnitude faster in high dimensions and drastically reduce memory requirements. The new algorithms provide instantaneous (few seconds) smoothing for matrices of dimension

J=10,000

and very fast (

<

10 minutes) smoothing for

J=100,000

. Although SVDS is simpler than FACE, we provide ready to use, scalable R software for FACE. When incorporated into R package {\it refund}, FACE improves the speed of penalized functional regression by an order of magnitude, even for data of normal size (

J <500

). We recommend that FACE be used in practice for the analysis of noisy and high-dimensional functional data.Comment: 35 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Collection Of Biostatistics Research Archive

Smoothness-Penalized Deconvolution (SPeD) of a Density Estimate

Author: Kent David
Ruppert David
Publication venue
Publication date: 10/04/2023
Field of study

This paper addresses the deconvolution problem of estimating a square-integrable probability density from observations contaminated with additive measurement errors having a known density. The estimator begins with a density estimate of the contaminated observations and minimizes a reconstruction error penalized by an integrated squared

m

-th derivative. Theory for deconvolution has mainly focused on kernel- or wavelet-based techniques, but other methods including spline-based techniques and this smoothness-penalized estimator have been found to outperform kernel methods in simulation studies. This paper fills in some of these gaps by establishing asymptotic guarantees for the smoothness-penalized approach. Consistency is established in mean integrated squared error, and rates of convergence are derived for Gaussian, Cauchy, and Laplace error densities, attaining some lower bounds already in the literature. The assumptions are weak for most results; the estimator can be used with a broader class of error densities than the deconvoluting kernel. Our application example estimates the density of the mean cytotoxicity of certain bacterial isolates under random sampling; this mean cytotoxicity can only be measured experimentally with additive error, leading to the deconvolution problem. We also describe a method for approximating the solution by a cubic spline, which reduces to a quadratic program.Comment: Revisions: added new theorem in Section 6; added list of assumptions; other, more minor revisions throughou

arXiv.org e-Print Archive