341 research outputs found

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Partial Coherence Estimation via Spectral Matrix Shrinkage under Quadratic Loss

    Get PDF
    Partial coherence is an important quantity derived from spectral or precision matrices and is used in seismology, meteorology, oceanography, neuroscience and elsewhere. If the number of complex degrees of freedom only slightly exceeds the dimension of the multivariate stationary time series, spectral matrices are poorly conditioned and shrinkage techniques suggest themselves. When true partial coherencies are quite large then for shrinkage estimators of the diagonal weighting kind it is shown empirically that the minimization of risk using quadratic loss (QL) leads to oracle partial coherence estimators superior to those derived by minimizing risk using Hilbert-Schmidt (HS) loss. When true partial coherencies are small the methods behave similarly. We derive two new QL estimators for spectral matrices, and new QL and HS estimators for precision matrices. In addition for the full estimation (non-oracle) case where certain trace expressions must also be estimated, we examine the behaviour of three different QL estimators, the precision matrix one seeming particularly robust and reliable. For the empirical study we carry out exact simulations derived from real EEG data for two individuals, one having large, and the other small, partial coherencies. This ensures our study covers cases of real-world relevance

    Nonparametric Stein-type Shrinkage Covariance Matrix Estimators in High-Dimensional Settings

    Get PDF
    Estimating a covariance matrix is an important task in applications where the number of variables is larger than the number of observations. Shrinkage approaches for estimating a high-dimensional covariance matrix are often employed to circumvent the limitations of the sample covariance matrix. A new family of nonparametric Stein-type shrinkage covariance estimators is proposed whose members are written as a convex linear combination of the sample covariance matrix and of a predefined invertible target matrix. Under the Frobenius norm criterion, the optimal shrinkage intensity that defines the best convex linear combination depends on the unobserved covariance matrix and it must be estimated from the data. A simple but effective estimation process that produces nonparametric and consistent estimators of the optimal shrinkage intensity for three popular target matrices is introduced. In simulations, the proposed Stein-type shrinkage covariance matrix estimator based on a scaled identity matrix appeared to be up to 80% more efficient than existing ones in extreme high-dimensional settings. A colon cancer dataset was analyzed to demonstrate the utility of the proposed estimators. A rule of thumb for adhoc selection among the three commonly used target matrices is recommended.Comment: To appear in Computational Statistics and Data Analysi

    Covariance regularization by thresholding

    Full text link
    This paper considers regularizing a covariance matrix of pp variables estimated from nn observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (logp)/n0(\log p)/n\to0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data.Comment: Published in at http://dx.doi.org/10.1214/08-AOS600 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian Nonstationary Spatial Modeling for Very Large Datasets

    Full text link
    With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: First, traditional spatial-statistical techniques are often unable to handle large numbers of observations in a computationally feasible way. Second, for large and heterogeneous spatial domains, it is often not appropriate to assume that a process of interest is stationary over the entire domain. We address the first challenge by using a model combining a low-rank component, which allows for flexible modeling of medium-to-long-range dependence via a set of spatial basis functions, with a tapered remainder component, which allows for modeling of local dependence using a compactly supported covariance function. Addressing the second challenge, we propose two extensions to this model that result in increased flexibility: First, the model is parameterized based on a nonstationary Matern covariance, where the parameters vary smoothly across space. Second, in our fully Bayesian model, all components and parameters are considered random, including the number, locations, and shapes of the basis functions used in the low-rank component. Using simulated data and a real-world dataset of high-resolution soil measurements, we show that both extensions can result in substantial improvements over the current state-of-the-art.Comment: 16 pages, 2 color figure
    corecore