Search CORE

10,115 research outputs found

Verification Under Increasing Dimensionality

Author: Hendrikse Anne
Spreeuwers Luuk
Veldhuis Raymond
Publication venue: IEEE Computer Society
Publication date: 01/01/2010
Field of study

Verification decisions are often based on second order statistics estimated from a set of samples. Ongoing growth of computational resources allows for considering more and more features, increasing the dimensionality of the samples. If the dimensionality is of the same order as the number of samples used in the estimation or even higher, then the accuracy of the estimate decreases significantly. In particular, the eigenvalues of the covariance matrix are estimated with a bias and the estimate of the eigenvectors differ considerably from the real eigenvectors. We show how a classical approach of verification in high dimensions is severely affected by these problems, and we show how bias correction methods can reduce these problems

University of Twente Research Information

Discriminant analysis under the common principal components model

Author: Nel D.G.
Pepler P.T.
Uys D.W.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015) proposed a regularised CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations where the CPC model is applicable. This paper extends their work to the context of discriminant analysis for two groups, by plugging the regularised CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures

Enlighten

Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions

Author: Ledoit Olivier
Wolf Michael
Publication venue
Publication date: 23/06/2014
Field of study

Covariance matrix estimation and principal component analysis (PCA) are two cornerstones of multivariate analysis. Classic textbook solutions perform poorly when the dimension of the data is of a magnitude similar to the sample size, or even larger. In such settings, there is a common remedy for both statistical problems: nonlinear shrinkage of the eigenvalues of the sample covariance matrix. The optimal nonlinear shrinkage formula depends on unknown population quantities and is thus not available. It is, however, possible to consistently estimate an oracle nonlinear shrinkage, which is motivated on asymptotic grounds. A key tool to this end is consistent estimation of the set of eigenvalues of the population covariance matrix (also known as the spectrum), an interesting and challenging problem in its own right. Extensive Monte Carlo simulations demonstrate that our methods have desirable finite-sample properties and outperform previous proposals.Comment: 40 pages, 8 figures, 5 tables, University of Zurich, Department of Economics, Working Paper No. 105, Revised version, July 201

arXiv.org e-Print Archive

ZORA

Statistical eigen-inference from large Wishart matrices

Author: Edelman Alan
Mingo James A.
Rao N. Raj
Speicher Roland
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

We consider settings where the observations are drawn from a zero-mean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., "eigen-inference"). Results found in the literature establish the asymptotic normality of the fluctuation in the trace of powers of the sample covariance matrix. We develop concrete algorithms for analytically computing the limiting quantities and the covariance of the fluctuations. We exploit the asymptotic normality of the trace of powers of the sample covariance matrix to develop eigenvalue-based procedures for testing and estimation. Specifically, we formulate a simple test of hypotheses for the population eigenvalues and a technique for estimating the population eigenvalues in settings where the cumulative distribution function of the (nonrandom) population eigenvalues has a staircase structure. Monte Carlo simulations are used to demonstrate the superiority of the proposed methodologies over classical techniques and the robustness of the proposed techniques in high-dimensional, (relatively) small sample size settings. The improved performance results from the fact that the proposed inference procedures are "global" (in a sense that we describe) and exploit "global" information thereby overcoming the inherent biases that cripple classical inference procedures which are "local" and rely on "local" information.Comment: Published in at http://dx.doi.org/10.1214/07-AOS583 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Robust spiked random matrices and a robust G-MUSIC estimator

Author: Couillet Romain
Publication venue
Publication date: 30/04/2014
Field of study

A class of robust estimators of scatter applied to information-plus-impulsive noise samples is studied, where the sample information matrix is assumed of low rank; this generalizes the study of (Couillet et al., 2013b) to spiked random matrix models. It is precisely shown that, as opposed to sample covariance matrices which may have asymptotically unbounded (eigen-)spectrum due to the sample impulsiveness, the robust estimator of scatter has bounded spectrum and may contain isolated eigenvalues which we fully characterize. We show that, if found beyond a certain detectability threshold, these eigenvalues allow one to perform statistical inference on the eigenvalues and eigenvectors of the information matrix. We use this result to derive new eigenvalue and eigenvector estimation procedures, which we apply in practice to the popular array processing problem of angle of arrival estimation. This gives birth to an improved algorithm based on the MUSIC method, which we refer to as robust G-MUSIC

arXiv.org e-Print Archive

The merit of high-frequency data in portfolio allocation

This paper addresses the open debate about the usefulness of high-frequency (HF) data in large-scale portfolio allocation. Daily covariances are estimated based on HF data of the S&P 500 universe employing a blocked realized kernel estimator. We propose forecasting covariance matrices using a multi-scale spectral decomposition where volatilities, correlation eigenvalues and eigenvectors evolve on different frequencies. In an extensive out-of-sample forecasting study, we show that the proposed approach yields less risky and more diversified portfolio allocations as prevailing methods employing daily data. These performance gains hold over longer horizons than previous studies have shown

Crossref

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Hochschulschriftenserver - Universität Frankfurt am Main

A nonparametric empirical Bayes approach to covariance matrix estimation

Author: Xin Huiqin
Zhao Sihai Dave
Publication venue
Publication date: 10/04/2021
Field of study

We propose an empirical Bayes method to estimate high-dimensional covariance matrices. Our procedure centers on vectorizing the covariance matrix and treating matrix estimation as a vector estimation problem. Drawing from the compound decision theory literature, we introduce a new class of decision rules that generalizes several existing procedures. We then use a nonparametric empirical Bayes g-modeling approach to estimate the oracle optimal rule in that class. This allows us to let the data itself determine how best to shrink the estimator, rather than shrinking in a pre-determined direction such as toward a diagonal matrix. Simulation results and a gene expression network analysis shows that our approach can outperform a number of state-of-the-art proposals in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure

arXiv.org e-Print Archive

Appoximation-assisted [sic] estimation of eigenvectors under quadratic loss

Author: Ahmed S.E.
Publication venue: 'Massey University'
Publication date: 01/01/2005
Field of study

Improved estimation of eigen vector of covariance matrix is considered under uncertain prior information (UPI) regarding the parameter vector. Like statistical models underlying the statistical inferences to be made, the prior information will be susceptible to uncertainty and the practitioners may be reluctant to impose the additional information regarding parameters in the estimation process. A very large gain in precision may be achieved by judiciously exploiting the information about the parameters which in practice will be available in any realistic problem. Several estimators based on preliminary test and the Stein-type shrinkage rules are constructed. The expressions for the bias and risk of the proposed estimators are derived and compared with the usual estimators. We demonstrate that how the classical large sample theory of the conventional estimator can be extended to shrinkage and preliminary test estimators for the eigenvector of a covariance matrix. It is established that shrinkage estimators are asymptotically superior to the usual sample estimators. For illustration purposes, the method is applied to three datasets

Massey Research Online

Signal Processing in Large Systems: a New Paradigm

Author: Couillet Romain
Debbah Merouane
Publication venue
Publication date: 19/06/2012
Field of study

For a long time, detection and parameter estimation methods for signal processing have relied on asymptotic statistics as the number

n

of observations of a population grows large comparatively to the population size

N

, i.e.

n/N\to \infty

. Modern technological and societal advances now demand the study of sometimes extremely large populations and simultaneously require fast signal processing due to accelerated system dynamics. This results in not-so-large practical ratios

n/N

, sometimes even smaller than one. A disruptive change in classical signal processing methods has therefore been initiated in the past ten years, mostly spurred by the field of large dimensional random matrix theory. The early works in random matrix theory for signal processing applications are however scarce and highly technical. This tutorial provides an accessible methodological introduction to the modern tools of random matrix theory and to the signal processing methods derived from them, with an emphasis on simple illustrative examples

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL Descartes