10,115 research outputs found

    Verification Under Increasing Dimensionality

    Get PDF
    Verification decisions are often based on second order statistics estimated from a set of samples. Ongoing growth of computational resources allows for considering more and more features, increasing the dimensionality of the samples. If the dimensionality is of the same order as the number of samples used in the estimation or even higher, then the accuracy of the estimate decreases significantly. In particular, the eigenvalues of the covariance matrix are estimated with a bias and the estimate of the eigenvectors differ considerably from the real eigenvectors. We show how a classical approach of verification in high dimensions is severely affected by these problems, and we show how bias correction methods can reduce these problems

    Discriminant analysis under the common principal components model

    Get PDF
    For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015) proposed a regularised CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations where the CPC model is applicable. This paper extends their work to the context of discriminant analysis for two groups, by plugging the regularised CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures

    Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions

    Full text link
    Covariance matrix estimation and principal component analysis (PCA) are two cornerstones of multivariate analysis. Classic textbook solutions perform poorly when the dimension of the data is of a magnitude similar to the sample size, or even larger. In such settings, there is a common remedy for both statistical problems: nonlinear shrinkage of the eigenvalues of the sample covariance matrix. The optimal nonlinear shrinkage formula depends on unknown population quantities and is thus not available. It is, however, possible to consistently estimate an oracle nonlinear shrinkage, which is motivated on asymptotic grounds. A key tool to this end is consistent estimation of the set of eigenvalues of the population covariance matrix (also known as the spectrum), an interesting and challenging problem in its own right. Extensive Monte Carlo simulations demonstrate that our methods have desirable finite-sample properties and outperform previous proposals.Comment: 40 pages, 8 figures, 5 tables, University of Zurich, Department of Economics, Working Paper No. 105, Revised version, July 201

    Statistical eigen-inference from large Wishart matrices

    Full text link
    We consider settings where the observations are drawn from a zero-mean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., "eigen-inference"). Results found in the literature establish the asymptotic normality of the fluctuation in the trace of powers of the sample covariance matrix. We develop concrete algorithms for analytically computing the limiting quantities and the covariance of the fluctuations. We exploit the asymptotic normality of the trace of powers of the sample covariance matrix to develop eigenvalue-based procedures for testing and estimation. Specifically, we formulate a simple test of hypotheses for the population eigenvalues and a technique for estimating the population eigenvalues in settings where the cumulative distribution function of the (nonrandom) population eigenvalues has a staircase structure. Monte Carlo simulations are used to demonstrate the superiority of the proposed methodologies over classical techniques and the robustness of the proposed techniques in high-dimensional, (relatively) small sample size settings. The improved performance results from the fact that the proposed inference procedures are "global" (in a sense that we describe) and exploit "global" information thereby overcoming the inherent biases that cripple classical inference procedures which are "local" and rely on "local" information.Comment: Published in at http://dx.doi.org/10.1214/07-AOS583 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Robust spiked random matrices and a robust G-MUSIC estimator

    Full text link
    A class of robust estimators of scatter applied to information-plus-impulsive noise samples is studied, where the sample information matrix is assumed of low rank; this generalizes the study of (Couillet et al., 2013b) to spiked random matrix models. It is precisely shown that, as opposed to sample covariance matrices which may have asymptotically unbounded (eigen-)spectrum due to the sample impulsiveness, the robust estimator of scatter has bounded spectrum and may contain isolated eigenvalues which we fully characterize. We show that, if found beyond a certain detectability threshold, these eigenvalues allow one to perform statistical inference on the eigenvalues and eigenvectors of the information matrix. We use this result to derive new eigenvalue and eigenvector estimation procedures, which we apply in practice to the popular array processing problem of angle of arrival estimation. This gives birth to an improved algorithm based on the MUSIC method, which we refer to as robust G-MUSIC

    The merit of high-frequency data in portfolio allocation

    Get PDF
    This paper addresses the open debate about the usefulness of high-frequency (HF) data in large-scale portfolio allocation. Daily covariances are estimated based on HF data of the S&P 500 universe employing a blocked realized kernel estimator. We propose forecasting covariance matrices using a multi-scale spectral decomposition where volatilities, correlation eigenvalues and eigenvectors evolve on different frequencies. In an extensive out-of-sample forecasting study, we show that the proposed approach yields less risky and more diversified portfolio allocations as prevailing methods employing daily data. These performance gains hold over longer horizons than previous studies have shown

    A nonparametric empirical Bayes approach to covariance matrix estimation

    Full text link
    We propose an empirical Bayes method to estimate high-dimensional covariance matrices. Our procedure centers on vectorizing the covariance matrix and treating matrix estimation as a vector estimation problem. Drawing from the compound decision theory literature, we introduce a new class of decision rules that generalizes several existing procedures. We then use a nonparametric empirical Bayes g-modeling approach to estimate the oracle optimal rule in that class. This allows us to let the data itself determine how best to shrink the estimator, rather than shrinking in a pre-determined direction such as toward a diagonal matrix. Simulation results and a gene expression network analysis shows that our approach can outperform a number of state-of-the-art proposals in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure

    Appoximation-assisted [sic] estimation of eigenvectors under quadratic loss

    Get PDF
    Improved estimation of eigen vector of covariance matrix is considered under uncertain prior information (UPI) regarding the parameter vector. Like statistical models underlying the statistical inferences to be made, the prior information will be susceptible to uncertainty and the practitioners may be reluctant to impose the additional information regarding parameters in the estimation process. A very large gain in precision may be achieved by judiciously exploiting the information about the parameters which in practice will be available in any realistic problem. Several estimators based on preliminary test and the Stein-type shrinkage rules are constructed. The expressions for the bias and risk of the proposed estimators are derived and compared with the usual estimators. We demonstrate that how the classical large sample theory of the conventional estimator can be extended to shrinkage and preliminary test estimators for the eigenvector of a covariance matrix. It is established that shrinkage estimators are asymptotically superior to the usual sample estimators. For illustration purposes, the method is applied to three datasets

    Signal Processing in Large Systems: a New Paradigm

    Full text link
    For a long time, detection and parameter estimation methods for signal processing have relied on asymptotic statistics as the number nn of observations of a population grows large comparatively to the population size NN, i.e. n/Nn/N\to \infty. Modern technological and societal advances now demand the study of sometimes extremely large populations and simultaneously require fast signal processing due to accelerated system dynamics. This results in not-so-large practical ratios n/Nn/N, sometimes even smaller than one. A disruptive change in classical signal processing methods has therefore been initiated in the past ten years, mostly spurred by the field of large dimensional random matrix theory. The early works in random matrix theory for signal processing applications are however scarce and highly technical. This tutorial provides an accessible methodological introduction to the modern tools of random matrix theory and to the signal processing methods derived from them, with an emphasis on simple illustrative examples
    corecore