10,115 research outputs found
Verification Under Increasing Dimensionality
Verification decisions are often based on second order statistics estimated from a set of samples. Ongoing growth of computational resources allows for considering more and more features, increasing the dimensionality of the samples. If the dimensionality is of the same order as the number of samples used in the estimation or even higher, then the accuracy of the estimate decreases significantly. In particular, the eigenvalues of the covariance matrix are estimated with a bias and the estimate of the eigenvectors differ considerably from the real eigenvectors. We show how a classical approach of verification in high dimensions is severely affected by these problems, and we show how bias correction methods can reduce these problems
Discriminant analysis under the common principal components model
For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015) proposed a regularised CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations where the CPC model is applicable. This paper extends their work to the context of discriminant analysis for two groups, by plugging the regularised CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures
Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions
Covariance matrix estimation and principal component analysis (PCA) are two
cornerstones of multivariate analysis. Classic textbook solutions perform
poorly when the dimension of the data is of a magnitude similar to the sample
size, or even larger. In such settings, there is a common remedy for both
statistical problems: nonlinear shrinkage of the eigenvalues of the sample
covariance matrix. The optimal nonlinear shrinkage formula depends on unknown
population quantities and is thus not available. It is, however, possible to
consistently estimate an oracle nonlinear shrinkage, which is motivated on
asymptotic grounds. A key tool to this end is consistent estimation of the set
of eigenvalues of the population covariance matrix (also known as the
spectrum), an interesting and challenging problem in its own right. Extensive
Monte Carlo simulations demonstrate that our methods have desirable
finite-sample properties and outperform previous proposals.Comment: 40 pages, 8 figures, 5 tables, University of Zurich, Department of
Economics, Working Paper No. 105, Revised version, July 201
Statistical eigen-inference from large Wishart matrices
We consider settings where the observations are drawn from a zero-mean
multivariate (real or complex) normal distribution with the population
covariance matrix having eigenvalues of arbitrary multiplicity. We assume that
the eigenvectors of the population covariance matrix are unknown and focus on
inferential procedures that are based on the sample eigenvalues alone (i.e.,
"eigen-inference"). Results found in the literature establish the asymptotic
normality of the fluctuation in the trace of powers of the sample covariance
matrix. We develop concrete algorithms for analytically computing the limiting
quantities and the covariance of the fluctuations. We exploit the asymptotic
normality of the trace of powers of the sample covariance matrix to develop
eigenvalue-based procedures for testing and estimation. Specifically, we
formulate a simple test of hypotheses for the population eigenvalues and a
technique for estimating the population eigenvalues in settings where the
cumulative distribution function of the (nonrandom) population eigenvalues has
a staircase structure. Monte Carlo simulations are used to demonstrate the
superiority of the proposed methodologies over classical techniques and the
robustness of the proposed techniques in high-dimensional, (relatively) small
sample size settings. The improved performance results from the fact that the
proposed inference procedures are "global" (in a sense that we describe) and
exploit "global" information thereby overcoming the inherent biases that
cripple classical inference procedures which are "local" and rely on "local"
information.Comment: Published in at http://dx.doi.org/10.1214/07-AOS583 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Robust spiked random matrices and a robust G-MUSIC estimator
A class of robust estimators of scatter applied to information-plus-impulsive
noise samples is studied, where the sample information matrix is assumed of low
rank; this generalizes the study of (Couillet et al., 2013b) to spiked random
matrix models. It is precisely shown that, as opposed to sample covariance
matrices which may have asymptotically unbounded (eigen-)spectrum due to the
sample impulsiveness, the robust estimator of scatter has bounded spectrum and
may contain isolated eigenvalues which we fully characterize. We show that, if
found beyond a certain detectability threshold, these eigenvalues allow one to
perform statistical inference on the eigenvalues and eigenvectors of the
information matrix. We use this result to derive new eigenvalue and eigenvector
estimation procedures, which we apply in practice to the popular array
processing problem of angle of arrival estimation. This gives birth to an
improved algorithm based on the MUSIC method, which we refer to as robust
G-MUSIC
The merit of high-frequency data in portfolio allocation
This paper addresses the open debate about the usefulness of high-frequency (HF) data in large-scale portfolio allocation. Daily covariances are estimated based on HF data of the S&P 500 universe employing a blocked realized kernel estimator. We propose forecasting covariance matrices using a multi-scale spectral decomposition where volatilities, correlation eigenvalues and eigenvectors evolve on different frequencies. In an extensive out-of-sample forecasting study, we show that the proposed approach yields less risky and more diversified portfolio allocations as prevailing methods employing daily data. These performance gains hold over longer horizons than previous studies have shown
A nonparametric empirical Bayes approach to covariance matrix estimation
We propose an empirical Bayes method to estimate high-dimensional covariance
matrices. Our procedure centers on vectorizing the covariance matrix and
treating matrix estimation as a vector estimation problem. Drawing from the
compound decision theory literature, we introduce a new class of decision rules
that generalizes several existing procedures. We then use a nonparametric
empirical Bayes g-modeling approach to estimate the oracle optimal rule in that
class. This allows us to let the data itself determine how best to shrink the
estimator, rather than shrinking in a pre-determined direction such as toward a
diagonal matrix. Simulation results and a gene expression network analysis
shows that our approach can outperform a number of state-of-the-art proposals
in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure
Appoximation-assisted [sic] estimation of eigenvectors under quadratic loss
Improved estimation of eigen vector of covariance matrix is considered under uncertain
prior information (UPI) regarding the parameter vector. Like statistical models
underlying the statistical inferences to be made, the prior information will be
susceptible to uncertainty and the practitioners may be reluctant to impose the additional
information regarding parameters in the estimation process. A very large
gain in precision may be achieved by judiciously exploiting the information about the
parameters which in practice will be available in any realistic problem.
Several estimators based on preliminary test and the Stein-type shrinkage rules
are constructed. The expressions for the bias and risk of the proposed estimators
are derived and compared with the usual estimators. We demonstrate that how
the classical large sample theory of the conventional estimator can be extended to
shrinkage and preliminary test estimators for the eigenvector of a covariance matrix.
It is established that shrinkage estimators are asymptotically superior to the usual
sample estimators. For illustration purposes, the method is applied to three datasets
Signal Processing in Large Systems: a New Paradigm
For a long time, detection and parameter estimation methods for signal
processing have relied on asymptotic statistics as the number of
observations of a population grows large comparatively to the population size
, i.e. . Modern technological and societal advances now
demand the study of sometimes extremely large populations and simultaneously
require fast signal processing due to accelerated system dynamics. This results
in not-so-large practical ratios , sometimes even smaller than one. A
disruptive change in classical signal processing methods has therefore been
initiated in the past ten years, mostly spurred by the field of large
dimensional random matrix theory. The early works in random matrix theory for
signal processing applications are however scarce and highly technical. This
tutorial provides an accessible methodological introduction to the modern tools
of random matrix theory and to the signal processing methods derived from them,
with an emphasis on simple illustrative examples
- …