553 research outputs found

    Cleaning large correlation matrices: tools from random matrix theory

    Full text link
    This review covers recent results concerning the estimation of large covariance matrices using tools from Random Matrix Theory (RMT). We introduce several RMT methods and analytical techniques, such as the Replica formalism and Free Probability, with an emphasis on the Marchenko-Pastur equation that provides information on the resolvent of multiplicatively corrupted noisy matrices. Special care is devoted to the statistics of the eigenvectors of the empirical correlation matrix, which turn out to be crucial for many applications. We show in particular how these results can be used to build consistent "Rotationally Invariant" estimators (RIE) for large correlation matrices when there is no prior on the structure of the underlying process. The last part of this review is dedicated to some real-world applications within financial markets as a case in point. We establish empirically the efficacy of the RIE framework, which is found to be superior in this case to all previously proposed methods. The case of additively (rather than multiplicatively) corrupted noisy matrices is also dealt with in a special Appendix. Several open problems and interesting technical developments are discussed throughout the paper.Comment: 165 pages, article submitted to Physics Report

    Asymptotic power of sphericity tests for high-dimensional data

    Full text link
    This paper studies the asymptotic power of tests of sphericity against perturbations in a single unknown direction as both the dimensionality of the data and the number of observations go to infinity. We establish the convergence, under the null hypothesis and contiguous alternatives, of the log ratio of the joint densities of the sample covariance eigenvalues to a Gaussian process indexed by the norm of the perturbation. When the perturbation norm is larger than the phase transition threshold studied in Baik, Ben Arous and Peche [Ann. Probab. 33 (2005) 1643-1697] the limiting process is degenerate, and discrimination between the null and the alternative is asymptotically certain. When the norm is below the threshold, the limiting process is nondegenerate, and the joint eigenvalue densities under the null and alternative hypotheses are mutually contiguous. Using the asymptotic theory of statistical experiments, we obtain asymptotic power envelopes and derive the asymptotic power for various sphericity tests in the contiguity region. In particular, we show that the asymptotic power of the Tracy-Widom-type tests is trivial (i.e., equals the asymptotic size), whereas that of the eigenvalue-based likelihood ratio test is strictly larger than the size, and close to the power envelope.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1100 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Applied stochastic eigen-analysis

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution February 2007The first part of the dissertation investigates the application of the theory of large random matrices to high-dimensional inference problems when the samples are drawn from a multivariate normal distribution. A longstanding problem in sensor array processing is addressed by designing an estimator for the number of signals in white noise that dramatically outperforms that proposed by Wax and Kailath. This methodology is extended to develop new parametric techniques for testing and estimation. Unlike techniques found in the literature, these exhibit robustness to high-dimensionality, sample size constraints and eigenvector misspecification. By interpreting the eigenvalues of the sample covariance matrix as an interacting particle system, the existence of a phase transition phenomenon in the largest (“signal”) eigenvalue is derived using heuristic arguments. This exposes a fundamental limit on the identifiability of low-level signals due to sample size constraints when using the sample eigenvalues alone. The analysis is extended to address a problem in sensor array processing, posed by Baggeroer and Cox, on the distribution of the outputs of the Capon-MVDR beamformer when the sample covariance matrix is diagonally loaded. The second part of the dissertation investigates the limiting distribution of the eigenvalues and eigenvectors of a broader class of random matrices. A powerful method is proposed that expands the reach of the theory beyond the special cases of matrices with Gaussian entries; this simultaneously establishes a framework for computational (non-commutative) “free probability” theory. The class of “algebraic” random matrices is defined and the generators of this class are specified. Algebraicity of a random matrix sequence is shown to act as a certificate of the computability of the limiting eigenvalue distribution and, for a subclass, the limiting conditional “eigenvector distribution.” The limiting moments of algebraic random matrix sequences, when they exist, are shown to satisfy a finite depth linear recursion so that they may often be efficiently enumerated in closed form. The method is applied to predict the deterioration in the quality of the sample eigenvectors of large algebraic empirical covariance matrices due to sample size constraints.I am grateful to the National Science Foundation for supporting this work via grant DMS-0411962 and the Office of Naval Research Graduate Traineeship awar

    Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices

    Get PDF
    Consider a deterministic self-adjoint matrix X_n with spectral measure converging to a compactly supported probability measure, the largest and smallest eigenvalues converging to the edges of the limiting measure. We perturb this matrix by adding a random finite rank matrix with delocalized eigenvectors and study the extreme eigenvalues of the deformed model. We give necessary conditions on the deterministic matrix X_n so that the eigenvalues converging out of the bulk exhibit Gaussian fluctuations, whereas the eigenvalues sticking to the edges are very close to the eigenvalues of the non-perturbed model and fluctuate in the same scale. We generalize these results to the case when X_n is random and get similar behavior when we deform some classical models such as Wigner or Wishart matrices with rather general entries or the so-called matrix models.Comment: 42 pages, Electron. J. Prob., Vol. 16 (2011), Paper no. 60, pages 1621-166
    • …
    corecore