3,491 research outputs found

    On information plus noise kernel random matrices

    Full text link
    Kernel random matrices have attracted a lot of interest in recent years, from both practical and theoretical standpoints. Most of the theoretical work so far has focused on the case were the data is sampled from a low-dimensional structure. Very recently, the first results concerning kernel random matrices with high-dimensional input data were obtained, in a setting where the data was sampled from a genuinely high-dimensional structure---similar to standard assumptions in random matrix theory. In this paper, we consider the case where the data is of the type "information+{}+{}noise." In other words, each observation is the sum of two independent elements: one sampled from a "low-dimensional" structure, the signal part of the data, the other being high-dimensional noise, normalized to not overwhelm but still affect the signal. We consider two types of noise, spherical and elliptical. In the spherical setting, we show that the spectral properties of kernel random matrices can be understood from a new kernel matrix, computed only from the signal part of the data, but using (in general) a slightly different kernel. The Gaussian kernel has some special properties in this setting. The elliptical setting, which is important from a robustness standpoint, is less prone to easy interpretation.Comment: Published in at http://dx.doi.org/10.1214/10-AOS801 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Tracy--Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices

    Full text link
    We consider the asymptotic fluctuation behavior of the largest eigenvalue of certain sample covariance matrices in the asymptotic regime where both dimensions of the corresponding data matrix go to infinity. More precisely, let XX be an n×pn\times p matrix, and let its rows be i.i.d. complex normal vectors with mean 0 and covariance Σp\Sigma_p. We show that for a large class of covariance matrices Σp\Sigma_p, the largest eigenvalue of XXX^*X is asymptotically distributed (after recentering and rescaling) as the Tracy--Widom distribution that appears in the study of the Gaussian unitary ensemble. We give explicit formulas for the centering and scaling sequences that are easy to implement and involve only the spectral distribution of the population covariance, nn and pp. The main theorem applies to a number of covariance models found in applications. For example, well-behaved Toeplitz matrices as well as covariance matrices whose spectral distribution is a sum of atoms (under some conditions on the mass of the atoms) are among the models the theorem can handle. Generalizations of the theorem to certain spiked versions of our models and a.s. results about the largest eigenvalue are given. We also discuss a simple corollary that does not require normality of the entries of the data matrix and some consequences for applications in multivariate statistics.Comment: Published at http://dx.doi.org/10.1214/009117906000000917 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond

    Full text link
    We place ourselves in the setting of high-dimensional statistical inference, where the number of variables pp in a data set of interest is of the same order of magnitude as the number of observations nn. More formally, we study the asymptotic properties of correlation and covariance matrices, in the setting where p/nρ(0,),p/n\to\rho\in(0,\infty), for general population covariance. We show that, for a large class of models studied in random matrix theory, spectral properties of large-dimensional correlation matrices are similar to those of large-dimensional covarance matrices. We also derive a Mar\u{c}enko--Pastur-type system of equations for the limiting spectral distribution of covariance matrices computed from data with elliptical distributions and generalizations of this family. The motivation for this study comes partly from the possible relevance of such distributional assumptions to problems in econometrics and portfolio optimization, as well as robustness questions for certain classical random matrix results. A mathematical theme of the paper is the important use we make of concentration inequalities.Comment: Published in at http://dx.doi.org/10.1214/08-AAP548 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The spectrum of kernel random matrices

    Full text link
    We place ourselves in the setting of high-dimensional statistical inference where the number of variables pp in a dataset of interest is of the same order of magnitude as the number of observations nn. We consider the spectrum of certain kernel random matrices, in particular n×nn\times n matrices whose (i,j)(i,j)th entry is f(XiXj/p)f(X_i'X_j/p) or f(XiXj2/p)f(\Vert X_i-X_j\Vert^2/p) where pp is the dimension of the data, and XiX_i are independent data vectors. Here ff is assumed to be a locally smooth function. The study is motivated by questions arising in statistics and computer science where these matrices are used to perform, among other things, nonlinear versions of principal component analysis. Surprisingly, we show that in high-dimensions, and for the models we analyze, the problem becomes essentially linear--which is at odds with heuristics sometimes used to justify the usage of these methods. The analysis also highlights certain peculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model high-dimensional data encountered in practice.Comment: Published in at http://dx.doi.org/10.1214/08-AOS648 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A rate of convergence result for the largest eigenvalue of complex white Wishart matrices

    Full text link
    It has been recently shown that if XX is an n×Nn\times N matrix whose entries are i.i.d. standard complex Gaussian and l1l_1 is the largest eigenvalue of XXX^*X, there exist sequences mn,Nm_{n,N} and sn,Ns_{n,N} such that (l1mn,N)/sn,N(l_1-m_{n,N})/s_{n,N} converges in distribution to W2W_2, the Tracy--Widom law appearing in the study of the Gaussian unitary ensemble. This probability law has a density which is known and computable. The cumulative distribution function of W2W_2 is denoted F2F_2. In this paper we show that, under the assumption that n/Nγ(0,)n/N\to \gamma\in(0,\infty), we can find a function MM, continuous and nonincreasing, and sequences μ~n,N\tilde{\mu}_{n,N} and σ~n,N\tilde{\sigma}_{n,N} such that, for all real s0s_0, there exists an integer N(s0,γ)N(s_0,\gamma) for which, if (nN)N(s0,γ)(n\wedge N)\geq N(s_0,\gamma), we have, with ln,N=(l1μ~n,N)/σ~n,Nl_{n,N}=(l_1-\tilde{\mu}_{n,N})/\tilde{\sigma}_{n,N}, ss0(nN)2/3P(ln,Ns)F2(s)M(s0)exp(s).\forall s\geq s_0\qquad (n\wedge N)^{2/3}|P(l_{n,N}\leq s)-F_2(s)|\leq M(s_0)\exp(-s). The surprisingly good 2/3 rate and qualitative properties of the bounding function help explain the fact that the limiting distribution W2W_2 is a good approximation to the empirical distribution of ln,Nl_{n,N} in simulations, an important fact from the point of view of (e.g., statistical) applications.Comment: Published at http://dx.doi.org/10.1214/009117906000000502 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Uniform approximation and explicit estimates for the prolate spheroidal wave functions

    Full text link
    For fixed c,c, Prolate Spheroidal Wave Functions (PSWFs), denoted by ψn,c,\psi_{n, c}, form an orthogonal basis with remarkable properties for the space of band-limited functions with bandwith cc. They have been largely studied and used after the seminal work of D. Slepian and his co-authors. In several applications, uniform estimates of the ψn,c\psi_{n,c} in nn and c,c, are needed. To progress in this direction, we push forward the uniform approximation error bounds and give an explicit approximation of their values at 11 in terms of the Legendre complete elliptic integral of the first kind. Also, we give an explicit formula for the accurate approximation the eigenvalues of the Sturm-Liouville operator associated with the PSWFs
    corecore