341 research outputs found
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Partial Coherence Estimation via Spectral Matrix Shrinkage under Quadratic Loss
Partial coherence is an important quantity derived from spectral or precision
matrices and is used in seismology, meteorology, oceanography, neuroscience and
elsewhere. If the number of complex degrees of freedom only slightly exceeds
the dimension of the multivariate stationary time series, spectral matrices are
poorly conditioned and shrinkage techniques suggest themselves. When true
partial coherencies are quite large then for shrinkage estimators of the
diagonal weighting kind it is shown empirically that the minimization of risk
using quadratic loss (QL) leads to oracle partial coherence estimators superior
to those derived by minimizing risk using Hilbert-Schmidt (HS) loss. When true
partial coherencies are small the methods behave similarly. We derive two new
QL estimators for spectral matrices, and new QL and HS estimators for precision
matrices. In addition for the full estimation (non-oracle) case where certain
trace expressions must also be estimated, we examine the behaviour of three
different QL estimators, the precision matrix one seeming particularly robust
and reliable. For the empirical study we carry out exact simulations derived
from real EEG data for two individuals, one having large, and the other small,
partial coherencies. This ensures our study covers cases of real-world
relevance
Nonparametric Stein-type Shrinkage Covariance Matrix Estimators in High-Dimensional Settings
Estimating a covariance matrix is an important task in applications where the
number of variables is larger than the number of observations. Shrinkage
approaches for estimating a high-dimensional covariance matrix are often
employed to circumvent the limitations of the sample covariance matrix. A new
family of nonparametric Stein-type shrinkage covariance estimators is proposed
whose members are written as a convex linear combination of the sample
covariance matrix and of a predefined invertible target matrix. Under the
Frobenius norm criterion, the optimal shrinkage intensity that defines the best
convex linear combination depends on the unobserved covariance matrix and it
must be estimated from the data. A simple but effective estimation process that
produces nonparametric and consistent estimators of the optimal shrinkage
intensity for three popular target matrices is introduced. In simulations, the
proposed Stein-type shrinkage covariance matrix estimator based on a scaled
identity matrix appeared to be up to 80% more efficient than existing ones in
extreme high-dimensional settings. A colon cancer dataset was analyzed to
demonstrate the utility of the proposed estimators. A rule of thumb for adhoc
selection among the three commonly used target matrices is recommended.Comment: To appear in Computational Statistics and Data Analysi
Covariance regularization by thresholding
This paper considers regularizing a covariance matrix of variables
estimated from observations, by hard thresholding. We show that the
thresholded estimate is consistent in the operator norm as long as the true
covariance matrix is sparse in a suitable sense, the variables are Gaussian or
sub-Gaussian, and , and obtain explicit rates. The results are
uniform over families of covariance matrices which satisfy a fairly natural
notion of sparsity. We discuss an intuitive resampling scheme for threshold
selection and prove a general cross-validation result that justifies this
approach. We also compare thresholding to other covariance estimators in
simulations and on an example from climate data.Comment: Published in at http://dx.doi.org/10.1214/08-AOS600 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Nonstationary Spatial Modeling for Very Large Datasets
With the proliferation of modern high-resolution measuring instruments
mounted on satellites, planes, ground-based vehicles and monitoring stations, a
need has arisen for statistical methods suitable for the analysis of large
spatial datasets observed on large spatial domains. Statistical analyses of
such datasets provide two main challenges: First, traditional
spatial-statistical techniques are often unable to handle large numbers of
observations in a computationally feasible way. Second, for large and
heterogeneous spatial domains, it is often not appropriate to assume that a
process of interest is stationary over the entire domain.
We address the first challenge by using a model combining a low-rank
component, which allows for flexible modeling of medium-to-long-range
dependence via a set of spatial basis functions, with a tapered remainder
component, which allows for modeling of local dependence using a compactly
supported covariance function. Addressing the second challenge, we propose two
extensions to this model that result in increased flexibility: First, the model
is parameterized based on a nonstationary Matern covariance, where the
parameters vary smoothly across space. Second, in our fully Bayesian model, all
components and parameters are considered random, including the number,
locations, and shapes of the basis functions used in the low-rank component.
Using simulated data and a real-world dataset of high-resolution soil
measurements, we show that both extensions can result in substantial
improvements over the current state-of-the-art.Comment: 16 pages, 2 color figure
- …