490 research outputs found
A Bayesian Multivariate Functional Dynamic Linear Model
We present a Bayesian approach for modeling multivariate, dependent
functional data. To account for the three dominant structural features in the
data--functional, time dependent, and multivariate components--we extend
hierarchical dynamic linear models for multivariate time series to the
functional data setting. We also develop Bayesian spline theory in a more
general constrained optimization framework. The proposed methods identify a
time-invariant functional basis for the functional observations, which is
smooth and interpretable, and can be made common across multivariate
observations for additional information sharing. The Bayesian framework permits
joint estimation of the model parameters, provides exact inference (up to MCMC
error) on specific parameters, and allows generalized dependence structures.
Sampling from the posterior distribution is accomplished with an efficient
Gibbs sampling algorithm. We illustrate the proposed framework with two
applications: (1) multi-economy yield curve data from the recent global
recession, and (2) local field potential brain signals in rats, for which we
develop a multivariate functional time series approach for multivariate
time-frequency analysis. Supplementary materials, including R code and the
multi-economy yield curve data, are available online
RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections
In high dimensions, the classical Hotelling's test tends to have low
power or becomes undefined due to singularity of the sample covariance matrix.
In this paper, this problem is overcome by projecting the data matrix onto
lower dimensional subspaces through multiplication by random matrices. We
propose RAPTT (RAndom Projection T-Test), an exact test for equality of means
of two normal populations based on projected lower dimensional data. RAPTT does
not require any constraints on the dimension of the data or the sample size. A
simulation study indicates that in high dimensions the power of this test is
often greater than that of competing tests. The advantage of RAPTT is
illustrated on high-dimensional gene expression data involving the
discrimination of tumor and normal colon tissues
Multilevel Bayesian framework for modeling the production, propagation and detection of ultra-high energy cosmic rays
Ultra-high energy cosmic rays (UHECRs) are atomic nuclei with energies over
ten million times energies accessible to human-made particle accelerators.
Evidence suggests that they originate from relatively nearby extragalactic
sources, but the nature of the sources is unknown. We develop a multilevel
Bayesian framework for assessing association of UHECRs and candidate source
populations, and Markov chain Monte Carlo algorithms for estimating model
parameters and comparing models by computing, via Chib's method, marginal
likelihoods and Bayes factors. We demonstrate the framework by analyzing
measurements of 69 UHECRs observed by the Pierre Auger Observatory (PAO) from
2004-2009, using a volume-complete catalog of 17 local active galactic nuclei
(AGN) out to 15 megaparsecs as candidate sources. An early portion of the data
("period 1," with 14 events) was used by PAO to set an energy cut maximizing
the anisotropy in period 1; the 69 measurements include this "tuned" subset,
and subsequent "untuned" events with energies above the same cutoff. Also,
measurement errors are approximately summarized. These factors are problematic
for independent analyses of PAO data. Within the context of "standard candle"
source models (i.e., with a common isotropic emission rate), and considering
only the 55 untuned events, there is no significant evidence favoring
association of UHECRs with local AGN vs. an isotropic background. The
highest-probability associations are with the two nearest, adjacent AGN,
Centaurus A and NGC 4945. If the association model is adopted, the fraction of
UHECRs that may be associated is likely nonzero but is well below 50%. Our
framework enables estimation of the angular scale for deflection of cosmic rays
by cosmic magnetic fields; relatively modest scales of to
are favored. Models that assign a large fraction of UHECRs to a
single nearby source (e.g., Centaurus A) are ruled out unless very large
deflection scales are specified a priori, and even then they are disfavored.
However, including the period 1 data alters the conclusions significantly, and
a simulation study supports the idea that the period 1 data are anomalous,
presumably due to the tuning. Accurate and optimal analysis of future data will
likely require more complete disclosure of the data.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS654 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Fast Covariance Estimation for High-dimensional Functional Data
For smoothing covariance functions, we propose two fast algorithms that scale
linearly with the number of observations per function. Most available methods
and software cannot smooth covariance matrices of dimension with
; the recently introduced sandwich smoother is an exception, but it is
not adapted to smooth covariance matrices of large dimensions such as . Covariance matrices of order , and even , are
becoming increasingly common, e.g., in 2- and 3-dimensional medical imaging and
high-density wearable sensor data. We introduce two new algorithms that can
handle very large covariance matrices: 1) FACE: a fast implementation of the
sandwich smoother and 2) SVDS: a two-step procedure that first applies singular
value decomposition to the data matrix and then smoothes the eigenvectors.
Compared to existing techniques, these new algorithms are at least an order of
magnitude faster in high dimensions and drastically reduce memory requirements.
The new algorithms provide instantaneous (few seconds) smoothing for matrices
of dimension and very fast ( 10 minutes) smoothing for
. Although SVDS is simpler than FACE, we provide ready to use,
scalable R software for FACE. When incorporated into R package {\it refund},
FACE improves the speed of penalized functional regression by an order of
magnitude, even for data of normal size (). We recommend that FACE be
used in practice for the analysis of noisy and high-dimensional functional
data.Comment: 35 pages, 4 figure
Smoothness-Penalized Deconvolution (SPeD) of a Density Estimate
This paper addresses the deconvolution problem of estimating a
square-integrable probability density from observations contaminated with
additive measurement errors having a known density. The estimator begins with a
density estimate of the contaminated observations and minimizes a
reconstruction error penalized by an integrated squared -th derivative.
Theory for deconvolution has mainly focused on kernel- or wavelet-based
techniques, but other methods including spline-based techniques and this
smoothness-penalized estimator have been found to outperform kernel methods in
simulation studies. This paper fills in some of these gaps by establishing
asymptotic guarantees for the smoothness-penalized approach. Consistency is
established in mean integrated squared error, and rates of convergence are
derived for Gaussian, Cauchy, and Laplace error densities, attaining some lower
bounds already in the literature. The assumptions are weak for most results;
the estimator can be used with a broader class of error densities than the
deconvoluting kernel. Our application example estimates the density of the mean
cytotoxicity of certain bacterial isolates under random sampling; this mean
cytotoxicity can only be measured experimentally with additive error, leading
to the deconvolution problem. We also describe a method for approximating the
solution by a cubic spline, which reduces to a quadratic program.Comment: Revisions: added new theorem in Section 6; added list of assumptions;
other, more minor revisions throughou
- β¦