4,079 research outputs found
A nonparametric test for a constant correlation matrix
We propose a nonparametric procedure to test for changes in correlation
matrices at an unknown point in time. The new test requires only mild
assumptions on the serial dependence structure and has considerable power in
finite samples. We derive the asymptotic distribution under the null hypothesis
of no change as well as local power results and apply the test to stock
returns
Testing independence in high dimensions with sums of rank correlations
We treat the problem of testing independence between m continuous variables
when m can be larger than the available sample size n. We consider three types
of test statistics that are constructed as sums or sums of squares of pairwise
rank correlations. In the asymptotic regime where both m and n tend to
infinity, a martingale central limit theorem is applied to show that the null
distributions of these statistics converge to Gaussian limits, which are valid
with no specific distributional or moment assumptions on the data. Using the
framework of U-statistics, our result covers a variety of rank correlations
including Kendall's tau and a dominating term of Spearman's rank correlation
coefficient (rho), but also degenerate U-statistics such as Hoeffding's , or
the of Bergsma and Dassios (2014). As in the classical theory for
U-statistics, the test statistics need to be scaled differently when the rank
correlations used to construct them are degenerate U-statistics. The power of
the considered tests is explored in rate-optimality theory under Gaussian
equicorrelation alternatives as well as in numerical experiments for specific
cases of more general alternatives
Multivariate Dependency Measure based on Copula and Gaussian Kernel
We propose a new multivariate dependency measure. It is obtained by
considering a Gaussian kernel based distance between the copula transform of
the given d-dimensional distribution and the uniform copula and then
appropriately normalizing it. The resulting measure is shown to satisfy a
number of desirable properties. A nonparametric estimate is proposed for this
dependency measure and its properties (finite sample as well as asymptotic) are
derived. Some comparative studies of the proposed dependency measure estimate
with some widely used dependency measure estimates on artificial datasets are
included. A non-parametric test of independence between two or more random
variables based on this measure is proposed. A comparison of the proposed test
with some existing nonparametric multivariate test for independence is
presented.Comment: This work is postpone
From Distance Correlation to Multiscale Graph Correlation
Understanding and developing a correlation measure that can detect general
dependencies is not only imperative to statistics and machine learning, but
also crucial to general scientific discovery in the big data age. In this
paper, we establish a new framework that generalizes distance correlation --- a
correlation measure that was recently proposed and shown to be universally
consistent for dependence testing against all joint distributions of finite
moments --- to the Multiscale Graph Correlation (MGC). By utilizing the
characteristic functions and incorporating the nearest neighbor machinery, we
formalize the population version of local distance correlations, define the
optimal scale in a given dependency, and name the optimal local correlation as
MGC. The new theoretical framework motivates a theoretically sound Sample MGC
and allows a number of desirable properties to be proved, including the
universal consistency, convergence and almost unbiasedness of the sample
version. The advantages of MGC are illustrated via a comprehensive set of
simulations with linear, nonlinear, univariate, multivariate, and noisy
dependencies, where it loses almost no power in monotone dependencies while
achieving better performance in general dependencies, compared to distance
correlation and other popular methods.Comment: 39 pages + Appendix 22 pages, 6 figure
High dimensional consistent independence testing with maxima of rank correlations
Testing mutual independence for high-dimensional observations is a
fundamental statistical challenge. Popular tests based on linear and simple
rank correlations are known to be incapable of detecting non-linear,
non-monotone relationships, calling for methods that can account for such
dependences. To address this challenge, we propose a family of tests that are
constructed using maxima of pairwise rank correlations that permit consistent
assessment of pairwise independence. Built upon a newly developed
Cram\'{e}r-type moderate deviation theorem for degenerate U-statistics, our
results cover a variety of rank correlations including Hoeffding's ,
Blum-Kiefer-Rosenblatt's , and Bergsma-Dassios-Yanagimoto's . The
proposed tests are distribution-free in the class of multivariate distributions
with continuous margins, implementable without the need for permutation, and
are shown to be rate-optimal against sparse alternatives under the Gaussian
copula model. As a by-product of the study, we reveal an identity between the
aforementioned three rank correlation statistics, and hence make a step towards
proving a conjecture of Bergsma and Dassios.Comment: to appear in the Annals of Statistic
Some New Copula Based Distribution-free Tests of Independence among Several Random Variables
Over the last couple of decades, several copula based methods have been
proposed in the literature to test for the independence among several random
variables. But these existing tests are not invariant under monotone
transformations of the variables, and they often perform poorly if the
dependence among the variables is highly non-monotone in nature. In this
article, we propose a copula based measure of dependency and use it to
construct some new distribution-free tests of independence. The proposed
measure and the resulting tests, all are invariant under permutations and
monotone transformations of the variables. Our dependency measure involves a
kernel function, and we use the Gaussian kernel for that purpose. We adopt a
multi-scale approach, where we look at the results obtained for several choices
of the bandwidth parameter associated with the Gaussian kernel and aggregate
them judiciously. Large sample properties of the dependency measure and the
resulting tests are derived under appropriate regularity conditions. Several
simulated and real data sets are analyzed to compare the performance of the
proposed tests with some popular tests available in the literature.Comment: arXiv admin note: text overlap with arXiv:1708.0748
A testing-based approach to the discovery of differentially correlated variable sets
Given data obtained under two sampling conditions, it is often of interest to
identify variables that behave differently in one condition than in the other.
We introduce a method for differential analysis of second-order behavior called
Differential Correlation Mining (DCM). The DCM method identifies differentially
correlated sets of variables, with the property that the average pairwise
correlation between variables in a set is higher under one sample condition
than the other. DCM is based on an iterative search procedure that adaptively
updates the size and elements of a candidate variable set. Updates are
performed via hypothesis testing of individual variables, based on the
asymptotic distribution of their average differential correlation. We
investigate the performance of DCM by applying it to simulated data as well as
recent experimental datasets in genomics and brain imaging
PC algorithm for Gaussian copula graphical models
The PC algorithm uses conditional independence tests for model selection in
graphical modeling with acyclic directed graphs. In Gaussian models, tests of
conditional independence are typically based on Pearson correlations, and
high-dimensional consistency results have been obtained for the PC algorithm in
this setting. We prove that high-dimensional consistency carries over to the
broader class of Gaussian copula or \textit{nonparanormal} models when using
rank-based measures of correlation. For graphs with bounded degree, our result
is as strong as prior Gaussian results. In simulations, the `Rank PC' algorithm
works as well as the `Pearson PC' algorithm for normal data and considerably
better for non-normal Gaussian copula data, all the while incurring a
negligible increase of computation time. Simulations with contaminated data
show that rank correlations can also perform better than other robust estimates
considered in previous work when the underlying distribution does not belong to
the nonparanormal family
Statistical dependence: Beyond Pearson's
Pearson's is the most used measure of statistical dependence. It gives
a complete characterization of dependence in the Gaussian case, and it also
works well in some non-Gaussian situations. It is well known, however, that it
has a number of shortcomings; in particular for heavy tailed distributions and
in nonlinear situations, where it may produce misleading, and even disastrous
results. In recent years a number of alternatives have been proposed. In this
paper, we will survey these developments, especially results obtained in the
last couple of decades. Among measures discussed are the copula,
distribution-based measures, the distance covariance, the HSIC measure popular
in machine learning, and finally the local Gaussian correlation, which is a
local version of Pearson's . Throughout we put the emphasis on conceptual
developments and a comparison of these. We point out relevant references to
technical details as well as comparative empirical and simulated experiments.
There is a broad selection of references under each topic treated
Compatibility and attainability of matrices of correlation-based measures of concordance
Measures of concordance have been widely used in insurance and risk
management to summarize non-linear dependence among risks modeled by random
variables, which Pearson's correlation coefficient cannot capture. However,
popular measures of concordance, such as Spearman's rho and Blomqvist's beta,
appear as classical correlations of transformed random variables. We
characterize a whole class of such concordance measures arising from
correlations of transformed random variables, which includes Spearman's rho,
Blomqvist's beta and van der Waerden's coefficient as special cases.
Compatibility and attainability of square matrices with entries given by such
measures are studied, that is, whether a given square matrix of such measures
of concordance can be realized for some random vector and how such a random
vector can be constructed. Compatibility and attainability of block matrices
and hierarchical matrices are also studied due to their practical importance in
insurance and risk management. In particular, a subclass of attainable block
Spearman's rho matrices is proposed to compensate for the drawback that
Spearman's rho matrices are in general not attainable for dimensions larger
than four. Another result concerns a novel analytical form of the Cholesky
factor of block matrices which allows one, for example, to construct random
vectors with given block matrices of van der Waerden's coefficient
- …