218 research outputs found

    On high-dimensional sign tests

    Full text link
    Sign tests are among the most successful procedures in multivariate nonparametric statistics. In this paper, we consider several testing problems in multivariate analysis, directional statistics and multivariate time series analysis, and we show that, under appropriate symmetry assumptions, the fixed-pp multivariate sign tests remain valid in the high-dimensional case. Remarkably, our asymptotic results are universal, in the sense that, unlike in most previous works in high-dimensional statistics, pp may go to infinity in an arbitrary way as nn does. We conduct simulations that (i) confirm our asymptotic results, (ii) reveal that, even for relatively large pp, chi-square critical values are to be favoured over the (asymptotically equivalent) Gaussian ones and (iii) show that, for testing i.i.d.-ness against serial dependence in the high-dimensional case, Portmanteau sign tests outperform their competitors in terms of validity-robustness.Comment: Published at http://dx.doi.org/10.3150/15-BEJ710 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Convergence and Fluctuations of Regularized Tyler Estimators

    Full text link
    This article studies the behavior of regularized Tyler estimators (RTEs) of scatter matrices. The key advantages of these estimators are twofold. First, they guarantee by construction a good conditioning of the estimate and second, being a derivative of robust Tyler estimators, they inherit their robustness properties, notably their resilience to the presence of outliers. Nevertheless, one major problem that poses the use of RTEs in practice is represented by the question of setting the regularization parameter ρ\rho. While a high value of ρ\rho is likely to push all the eigenvalues away from zero, it comes at the cost of a larger bias with respect to the population covariance matrix. A deep understanding of the statistics of RTEs is essential to come up with appropriate choices for the regularization parameter. This is not an easy task and might be out of reach, unless one considers asymptotic regimes wherein the number of observations nn and/or their size NN increase together. First asymptotic results have recently been obtained under the assumption that NN and nn are large and commensurable. Interestingly, no results concerning the regime of nn going to infinity with NN fixed exist, even though the investigation of this assumption has usually predated the analysis of the most difficult NN and nn large case. This motivates our work. In particular, we prove in the present paper that the RTEs converge to a deterministic matrix when nn\to\infty with NN fixed, which is expressed as a function of the theoretical covariance matrix. We also derive the fluctuations of the RTEs around this deterministic matrix and establish that these fluctuations converge in distribution to a multivariate Gaussian distribution with zero mean and a covariance depending on the population covariance and the parameter ρ\rho

    Random geometric graphs in high dimension

    Full text link
    Many machine learning algorithms used for dimensional reduction and manifold learning leverage on the computation of the nearest neighbours to each point of a dataset to perform their tasks. These proximity relations define a so-called geometric graph, where two nodes are linked if they are sufficiently close to each other. Random geometric graphs, where the positions of nodes are randomly generated in a subset of Rd\mathbb{R}^{d}, offer a null model to study typical properties of datasets and of machine learning algorithms. Up to now, most of the literature focused on the characterization of low-dimensional random geometric graphs whereas typical datasets of interest in machine learning live in high-dimensional spaces (d102d \gg 10^{2}). In this work, we consider the infinite dimensions limit of hard and soft random geometric graphs and we show how to compute the average number of subgraphs of given finite size kk, e.g. the average number of kk-cliques. This analysis highlights that local observables display different behaviors depending on the chosen ensemble: soft random geometric graphs with continuous activation functions converge to the naive infinite dimensional limit provided by Erd\"os-R\'enyi graphs, whereas hard random geometric graphs can show systematic deviations from it. We present numerical evidence that our analytical insights, exact in infinite dimensions, provide a good approximation also for dimension d10d\gtrsim10

    A Deterministic Equivalent for the Analysis of Non-Gaussian Correlated MIMO Multiple Access Channels

    Full text link
    Large dimensional random matrix theory (RMT) has provided an efficient analytical tool to understand multiple-input multiple-output (MIMO) channels and to aid the design of MIMO wireless communication systems. However, previous studies based on large dimensional RMT rely on the assumption that the transmit correlation matrix is diagonal or the propagation channel matrix is Gaussian. There is an increasing interest in the channels where the transmit correlation matrices are generally nonnegative definite and the channel entries are non-Gaussian. This class of channel models appears in several applications in MIMO multiple access systems, such as small cell networks (SCNs). To address these problems, we use the generalized Lindeberg principle to show that the Stieltjes transforms of this class of random matrices with Gaussian or non-Gaussian independent entries coincide in the large dimensional regime. This result permits to derive the deterministic equivalents (e.g., the Stieltjes transform and the ergodic mutual information) for non-Gaussian MIMO channels from the known results developed for Gaussian MIMO channels, and is of great importance in characterizing the spectral efficiency of SCNs.Comment: This paper is the revision of the original manuscript titled "A Deterministic Equivalent for the Analysis of Small Cell Networks". We have revised the original manuscript and reworked on the organization to improve the presentation as well as readabilit
    corecore