4,079 research outputs found

    A nonparametric test for a constant correlation matrix

    Full text link
    We propose a nonparametric procedure to test for changes in correlation matrices at an unknown point in time. The new test requires only mild assumptions on the serial dependence structure and has considerable power in finite samples. We derive the asymptotic distribution under the null hypothesis of no change as well as local power results and apply the test to stock returns

    Testing independence in high dimensions with sums of rank correlations

    Full text link
    We treat the problem of testing independence between m continuous variables when m can be larger than the available sample size n. We consider three types of test statistics that are constructed as sums or sums of squares of pairwise rank correlations. In the asymptotic regime where both m and n tend to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendall's tau and a dominating term of Spearman's rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffding's DD, or the τ\tau^* of Bergsma and Dassios (2014). As in the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under Gaussian equicorrelation alternatives as well as in numerical experiments for specific cases of more general alternatives

    Multivariate Dependency Measure based on Copula and Gaussian Kernel

    Full text link
    We propose a new multivariate dependency measure. It is obtained by considering a Gaussian kernel based distance between the copula transform of the given d-dimensional distribution and the uniform copula and then appropriately normalizing it. The resulting measure is shown to satisfy a number of desirable properties. A nonparametric estimate is proposed for this dependency measure and its properties (finite sample as well as asymptotic) are derived. Some comparative studies of the proposed dependency measure estimate with some widely used dependency measure estimates on artificial datasets are included. A non-parametric test of independence between two or more random variables based on this measure is proposed. A comparison of the proposed test with some existing nonparametric multivariate test for independence is presented.Comment: This work is postpone

    From Distance Correlation to Multiscale Graph Correlation

    Full text link
    Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation --- a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments --- to the Multiscale Graph Correlation (MGC). By utilizing the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound Sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to distance correlation and other popular methods.Comment: 39 pages + Appendix 22 pages, 6 figure

    High dimensional consistent independence testing with maxima of rank correlations

    Full text link
    Testing mutual independence for high-dimensional observations is a fundamental statistical challenge. Popular tests based on linear and simple rank correlations are known to be incapable of detecting non-linear, non-monotone relationships, calling for methods that can account for such dependences. To address this challenge, we propose a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of pairwise independence. Built upon a newly developed Cram\'{e}r-type moderate deviation theorem for degenerate U-statistics, our results cover a variety of rank correlations including Hoeffding's DD, Blum-Kiefer-Rosenblatt's RR, and Bergsma-Dassios-Yanagimoto's τ\tau^*. The proposed tests are distribution-free in the class of multivariate distributions with continuous margins, implementable without the need for permutation, and are shown to be rate-optimal against sparse alternatives under the Gaussian copula model. As a by-product of the study, we reveal an identity between the aforementioned three rank correlation statistics, and hence make a step towards proving a conjecture of Bergsma and Dassios.Comment: to appear in the Annals of Statistic

    Some New Copula Based Distribution-free Tests of Independence among Several Random Variables

    Full text link
    Over the last couple of decades, several copula based methods have been proposed in the literature to test for the independence among several random variables. But these existing tests are not invariant under monotone transformations of the variables, and they often perform poorly if the dependence among the variables is highly non-monotone in nature. In this article, we propose a copula based measure of dependency and use it to construct some new distribution-free tests of independence. The proposed measure and the resulting tests, all are invariant under permutations and monotone transformations of the variables. Our dependency measure involves a kernel function, and we use the Gaussian kernel for that purpose. We adopt a multi-scale approach, where we look at the results obtained for several choices of the bandwidth parameter associated with the Gaussian kernel and aggregate them judiciously. Large sample properties of the dependency measure and the resulting tests are derived under appropriate regularity conditions. Several simulated and real data sets are analyzed to compare the performance of the proposed tests with some popular tests available in the literature.Comment: arXiv admin note: text overlap with arXiv:1708.0748

    A testing-based approach to the discovery of differentially correlated variable sets

    Full text link
    Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as recent experimental datasets in genomics and brain imaging

    PC algorithm for Gaussian copula graphical models

    Full text link
    The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and high-dimensional consistency results have been obtained for the PC algorithm in this setting. We prove that high-dimensional consistency carries over to the broader class of Gaussian copula or \textit{nonparanormal} models when using rank-based measures of correlation. For graphs with bounded degree, our result is as strong as prior Gaussian results. In simulations, the `Rank PC' algorithm works as well as the `Pearson PC' algorithm for normal data and considerably better for non-normal Gaussian copula data, all the while incurring a negligible increase of computation time. Simulations with contaminated data show that rank correlations can also perform better than other robust estimates considered in previous work when the underlying distribution does not belong to the nonparanormal family

    Statistical dependence: Beyond Pearson's ρ\rho

    Full text link
    Pearson's ρ\rho is the most used measure of statistical dependence. It gives a complete characterization of dependence in the Gaussian case, and it also works well in some non-Gaussian situations. It is well known, however, that it has a number of shortcomings; in particular for heavy tailed distributions and in nonlinear situations, where it may produce misleading, and even disastrous results. In recent years a number of alternatives have been proposed. In this paper, we will survey these developments, especially results obtained in the last couple of decades. Among measures discussed are the copula, distribution-based measures, the distance covariance, the HSIC measure popular in machine learning, and finally the local Gaussian correlation, which is a local version of Pearson's ρ\rho. Throughout we put the emphasis on conceptual developments and a comparison of these. We point out relevant references to technical details as well as comparative empirical and simulated experiments. There is a broad selection of references under each topic treated

    Compatibility and attainability of matrices of correlation-based measures of concordance

    Full text link
    Measures of concordance have been widely used in insurance and risk management to summarize non-linear dependence among risks modeled by random variables, which Pearson's correlation coefficient cannot capture. However, popular measures of concordance, such as Spearman's rho and Blomqvist's beta, appear as classical correlations of transformed random variables. We characterize a whole class of such concordance measures arising from correlations of transformed random variables, which includes Spearman's rho, Blomqvist's beta and van der Waerden's coefficient as special cases. Compatibility and attainability of square matrices with entries given by such measures are studied, that is, whether a given square matrix of such measures of concordance can be realized for some random vector and how such a random vector can be constructed. Compatibility and attainability of block matrices and hierarchical matrices are also studied due to their practical importance in insurance and risk management. In particular, a subclass of attainable block Spearman's rho matrices is proposed to compensate for the drawback that Spearman's rho matrices are in general not attainable for dimensions larger than four. Another result concerns a novel analytical form of the Cholesky factor of block matrices which allows one, for example, to construct random vectors with given block matrices of van der Waerden's coefficient
    corecore