553 research outputs found
Cleaning large correlation matrices: tools from random matrix theory
This review covers recent results concerning the estimation of large
covariance matrices using tools from Random Matrix Theory (RMT). We introduce
several RMT methods and analytical techniques, such as the Replica formalism
and Free Probability, with an emphasis on the Marchenko-Pastur equation that
provides information on the resolvent of multiplicatively corrupted noisy
matrices. Special care is devoted to the statistics of the eigenvectors of the
empirical correlation matrix, which turn out to be crucial for many
applications. We show in particular how these results can be used to build
consistent "Rotationally Invariant" estimators (RIE) for large correlation
matrices when there is no prior on the structure of the underlying process. The
last part of this review is dedicated to some real-world applications within
financial markets as a case in point. We establish empirically the efficacy of
the RIE framework, which is found to be superior in this case to all previously
proposed methods. The case of additively (rather than multiplicatively)
corrupted noisy matrices is also dealt with in a special Appendix. Several open
problems and interesting technical developments are discussed throughout the
paper.Comment: 165 pages, article submitted to Physics Report
Asymptotic power of sphericity tests for high-dimensional data
This paper studies the asymptotic power of tests of sphericity against
perturbations in a single unknown direction as both the dimensionality of the
data and the number of observations go to infinity. We establish the
convergence, under the null hypothesis and contiguous alternatives, of the log
ratio of the joint densities of the sample covariance eigenvalues to a Gaussian
process indexed by the norm of the perturbation. When the perturbation norm is
larger than the phase transition threshold studied in Baik, Ben Arous and Peche
[Ann. Probab. 33 (2005) 1643-1697] the limiting process is degenerate, and
discrimination between the null and the alternative is asymptotically certain.
When the norm is below the threshold, the limiting process is nondegenerate,
and the joint eigenvalue densities under the null and alternative hypotheses
are mutually contiguous. Using the asymptotic theory of statistical
experiments, we obtain asymptotic power envelopes and derive the asymptotic
power for various sphericity tests in the contiguity region. In particular, we
show that the asymptotic power of the Tracy-Widom-type tests is trivial (i.e.,
equals the asymptotic size), whereas that of the eigenvalue-based likelihood
ratio test is strictly larger than the size, and close to the power envelope.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1100 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Applied stochastic eigen-analysis
Submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy at the Massachusetts Institute of Technology and the
Woods Hole Oceanographic Institution February 2007The first part of the dissertation investigates the application of the theory of large
random matrices to high-dimensional inference problems when the samples are drawn
from a multivariate normal distribution. A longstanding problem in sensor array processing
is addressed by designing an estimator for the number of signals in white noise
that dramatically outperforms that proposed by Wax and Kailath. This methodology is
extended to develop new parametric techniques for testing and estimation. Unlike techniques
found in the literature, these exhibit robustness to high-dimensionality, sample
size constraints and eigenvector misspecification.
By interpreting the eigenvalues of the sample covariance matrix as an interacting
particle system, the existence of a phase transition phenomenon in the largest (“signal”)
eigenvalue is derived using heuristic arguments. This exposes a fundamental limit on
the identifiability of low-level signals due to sample size constraints when using the
sample eigenvalues alone.
The analysis is extended to address a problem in sensor array processing, posed by
Baggeroer and Cox, on the distribution of the outputs of the Capon-MVDR beamformer
when the sample covariance matrix is diagonally loaded.
The second part of the dissertation investigates the limiting distribution of the
eigenvalues and eigenvectors of a broader class of random matrices. A powerful method
is proposed that expands the reach of the theory beyond the special cases of matrices
with Gaussian entries; this simultaneously establishes a framework for computational
(non-commutative) “free probability” theory.
The class of “algebraic” random matrices is defined and the generators of this class
are specified. Algebraicity of a random matrix sequence is shown to act as a certificate
of the computability of the limiting eigenvalue distribution and, for a subclass, the limiting
conditional “eigenvector distribution.” The limiting moments of algebraic random
matrix sequences, when they exist, are shown to satisfy a finite depth linear recursion
so that they may often be efficiently enumerated in closed form. The method is applied
to predict the deterioration in the quality of the sample eigenvectors of large algebraic
empirical covariance matrices due to sample size constraints.I am grateful to the National Science Foundation for supporting this work via grant
DMS-0411962 and the Office of Naval Research Graduate Traineeship awar
Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices
Consider a deterministic self-adjoint matrix X_n with spectral measure
converging to a compactly supported probability measure, the largest and
smallest eigenvalues converging to the edges of the limiting measure. We
perturb this matrix by adding a random finite rank matrix with delocalized
eigenvectors and study the extreme eigenvalues of the deformed model. We give
necessary conditions on the deterministic matrix X_n so that the eigenvalues
converging out of the bulk exhibit Gaussian fluctuations, whereas the
eigenvalues sticking to the edges are very close to the eigenvalues of the
non-perturbed model and fluctuate in the same scale. We generalize these
results to the case when X_n is random and get similar behavior when we deform
some classical models such as Wigner or Wishart matrices with rather general
entries or the so-called matrix models.Comment: 42 pages, Electron. J. Prob., Vol. 16 (2011), Paper no. 60, pages
1621-166
- …