126,492 research outputs found

    Brownian distance covariance

    Full text link
    Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but generalize and extend these classical bivariate measures of dependence. Distance correlation characterizes independence: it is zero if and only if the random vectors are independent. The notion of covariance with respect to a stochastic process is introduced, and it is shown that population distance covariance coincides with the covariance with respect to Brownian motion; thus, both can be called Brownian distance covariance. In the bivariate case, Brownian covariance is the natural extension of product-moment covariance, as we obtain Pearson product-moment covariance by replacing the Brownian motion in the definition with identity. The corresponding statistic has an elegantly simple computing formula. Advantages of applying Brownian covariance and correlation vs the classical Pearson covariance and correlation are discussed and illustrated.Comment: This paper discussed in: [arXiv:0912.3295], [arXiv:1010.0822], [arXiv:1010.0825], [arXiv:1010.0828], [arXiv:1010.0836], [arXiv:1010.0838], [arXiv:1010.0839]. Rejoinder at [arXiv:1010.0844]. Published in at http://dx.doi.org/10.1214/09-AOAS312 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Independence test for high dimensional data based on regularized canonical correlation coefficients

    Full text link
    This paper proposes a new statistic to test independence between two high dimensional random vectors X:p1×1{\mathbf{X}}:p_1\times1 and Y:p2×1{\mathbf{Y}}:p_2\times1. The proposed statistic is based on the sum of regularized sample canonical correlation coefficients of X{\mathbf{X}} and Y{\mathbf{Y}}. The asymptotic distribution of the statistic under the null hypothesis is established as a corollary of general central limit theorems (CLT) for the linear statistics of classical and regularized sample canonical correlation coefficients when p1p_1 and p2p_2 are both comparable to the sample size nn. As applications of the developed independence test, various types of dependent structures, such as factor models, ARCH models and a general uncorrelated but dependent case, etc., are investigated by simulations. As an empirical application, cross-sectional dependence of daily stock returns of companies between different sections in the New York Stock Exchange (NYSE) is detected by the proposed test.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1284 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Approximating Subadditive Hadamard Functions on Implicit Matrices

    Get PDF
    An important challenge in the streaming model is to maintain small-space approximations of entrywise functions performed on a matrix that is generated by the outer product of two vectors given as a stream. In other works, streams typically define matrices in a standard way via a sequence of updates, as in the work of Woodruff (2014) and others. We describe the matrix formed by the outer product, and other matrices that do not fall into this category, as implicit matrices. As such, we consider the general problem of computing over such implicit matrices with Hadamard functions, which are functions applied entrywise on a matrix. In this paper, we apply this generalization to provide new techniques for identifying independence between two vectors in the streaming model. The previous state of the art algorithm of Braverman and Ostrovsky (2010) gave a (1±ϵ)(1 \pm \epsilon)-approximation for the L1L_1 distance between the product and joint distributions, using space O(log1024(nm)ϵ1024)O(\log^{1024}(nm) \epsilon^{-1024}), where mm is the length of the stream and nn denotes the size of the universe from which stream elements are drawn. Our general techniques include the L1L_1 distance as a special case, and we give an improved space bound of O(log12(n)log2(nmϵ)ϵ7)O(\log^{12}(n) \log^{2}({nm \over \epsilon})\epsilon^{-7})

    Progress on Polynomial Identity Testing - II

    Full text link
    We survey the area of algebraic complexity theory; with the focus being on the problem of polynomial identity testing (PIT). We discuss the key ideas that have gone into the results of the last few years.Comment: 17 pages, 1 figure, surve
    corecore