8,038 research outputs found
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Bayesian Multivariate Functional Dynamic Linear Model
We present a Bayesian approach for modeling multivariate, dependent
functional data. To account for the three dominant structural features in the
data--functional, time dependent, and multivariate components--we extend
hierarchical dynamic linear models for multivariate time series to the
functional data setting. We also develop Bayesian spline theory in a more
general constrained optimization framework. The proposed methods identify a
time-invariant functional basis for the functional observations, which is
smooth and interpretable, and can be made common across multivariate
observations for additional information sharing. The Bayesian framework permits
joint estimation of the model parameters, provides exact inference (up to MCMC
error) on specific parameters, and allows generalized dependence structures.
Sampling from the posterior distribution is accomplished with an efficient
Gibbs sampling algorithm. We illustrate the proposed framework with two
applications: (1) multi-economy yield curve data from the recent global
recession, and (2) local field potential brain signals in rats, for which we
develop a multivariate functional time series approach for multivariate
time-frequency analysis. Supplementary materials, including R code and the
multi-economy yield curve data, are available online
Multi-State Models for Panel Data: The msm Package for R
Panel data are observations of a continuous-time process at arbitrary times, for example, visits to a hospital to diagnose disease status. Multi-state models for such data are generally based on the Markov assumption. This article reviews the range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R. Transition intensities may vary between individuals, or with piecewise-constant time-dependent covariates, giving an inhomogeneous Markov model. Hidden Markov models can be used for multi-state processes which are misclassified or observed only through a noisy marker. The package is intended to be straightforward to use, flexible and comprehensively documented. Worked examples are given of the use of msm to model chronic disease progression and screening. Assessment of model fit, and potential future developments of the software, are also discussed.
Modeling Covariate Effects in Group Independent Component Analysis with Applications to Functional Magnetic Resonance Imaging
Independent component analysis (ICA) is a powerful computational tool for
separating independent source signals from their linear mixtures. ICA has been
widely applied in neuroimaging studies to identify and characterize underlying
brain functional networks. An important goal in such studies is to assess the
effects of subjects' clinical and demographic covariates on the spatial
distributions of the functional networks. Currently, covariate effects are not
incorporated in existing group ICA decomposition methods. Hence, they can only
be evaluated through ad-hoc approaches which may not be accurate in many cases.
In this paper, we propose a hierarchical covariate ICA model that provides a
formal statistical framework for estimating and testing covariate effects in
ICA decomposition. A maximum likelihood method is proposed for estimating the
covariate ICA model. We develop two expectation-maximization (EM) algorithms to
obtain maximum likelihood estimates. The first is an exact EM algorithm, which
has analytically tractable E-step and M-step. Additionally, we propose a
subspace-based approximate EM, which can significantly reduce computational
time while still retain high model-fitting accuracy. Furthermore, to test
covariate effects on the functional networks, we develop a voxel-wise
approximate inference procedure which eliminates the needs of computationally
expensive covariance estimation. The performance of the proposed methods is
evaluated via simulation studies. The application is illustrated through an
fMRI study of Zen meditation.Comment: 36 pages, 5 figure
Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements
Continuous-time birth-death-shift (BDS) processes are frequently used in
stochastic modeling, with many applications in ecology and epidemiology. In
particular, such processes can model evolutionary dynamics of transposable
elements - important genetic markers in molecular epidemiology. Estimation of
the effects of individual covariates on the birth, death, and shift rates of
the process can be accomplished by analyzing patient data, but inferring these
rates in a discretely and unevenly observed setting presents computational
challenges. We propose a mutli-type branching process approximation to BDS
processes and develop a corresponding expectation maximization (EM) algorithm,
where we use spectral techniques to reduce calculation of expected sufficient
statistics to low dimensional integration. These techniques yield an efficient
and robust optimization routine for inferring the rates of the BDS process, and
apply more broadly to multi-type branching processes where rates can depend on
many covariates. After rigorously testing our methodology in simulation
studies, we apply our method to study intrapatient time evolution of IS6110
transposable element, a frequently used element during estimation of
epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl
- …