254 research outputs found

    A maximum-mean-discrepancy goodness-of-fit test for censored data

    Get PDF
    We introduce a kernel-based goodness-of-fit test for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life-testing. The test statistic is straightforward to compute, as is the test threshold, and we establish consistency under the null. Unlike earlier approaches such as the Log-rank test, we make no assumptions as to how the data distribution might differ from the null, and our test has power against a very rich class of alternatives. In experiments, our test outperforms competing approaches for periodic and Weibull hazard functions (where risks are time dependent), and does not show the failure modes of tests that rely on user-defined features. Moreover, in cases where classical tests are provably most powerful, our test performs almost as well, while being more general

    A Kernel Independence Test for Random Processes

    Full text link
    A new non parametric approach to the problem of testing the independence of two random process is developed. The test statistic is the Hilbert Schmidt Independence Criterion (HSIC), which was used previously in testing independence for i.i.d pairs of variables. The asymptotic behaviour of HSIC is established when computed from samples drawn from random processes. It is shown that earlier bootstrap procedures which worked in the i.i.d. case will fail for random processes, and an alternative consistent estimate of the p-values is proposed. Tests on artificial data and real-world Forex data indicate that the new test procedure discovers dependence which is missed by linear approaches, while the earlier bootstrap procedure returns an elevated number of false positives. The code is available online: https://github.com/kacperChwialkowski/HSIC .Comment: In Proceedings of The 31st International Conference on Machine Learnin

    A low variance consistent test of relative dependency

    Get PDF
    We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc

    A Kernel Test for Three-Variable Interactions

    Full text link
    We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures

    Kernel Bayes' rule

    Full text link
    A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces. Probabilities are uniquely characterized by the mean of the canonical map to the RKHS. The prior and conditional probabilities are expressed in terms of RKHS functions of an empirical sample: no explicit parametric model is needed for these quantities. The posterior is likewise an RKHS mean of a weighted sample. The estimator for the expectation of a function of the posterior is derived, and rates of consistency are shown. Some representative applications of the kernel Bayes' rule are presented, including Baysian computation without likelihood and filtering with a nonparametric state-space model.Comment: 27 pages, 5 figure

    Discussion of: Brownian distance covariance

    Full text link
    Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely and Maria L. Rizzo [arXiv:1010.0297]Comment: Published in at http://dx.doi.org/10.1214/09-AOAS312E the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    MERLiN: Mixture Effect Recovery in Linear Networks

    Get PDF
    Causal inference concerns the identification of cause-effect relationships between variables, e.g. establishing whether a stimulus affects activity in a certain brain region. The observed variables themselves often do not constitute meaningful causal variables, however, and linear combinations need to be considered. In electroencephalographic studies, for example, one is not interested in establishing cause-effect relationships between electrode signals (the observed variables), but rather between cortical signals (the causal variables) which can be recovered as linear combinations of electrode signals. We introduce MERLiN (Mixture Effect Recovery in Linear Networks), a family of causal inference algorithms that implement a novel means of constructing causal variables from non-causal variables. We demonstrate through application to EEG data how the basic MERLiN algorithm can be extended for application to different (neuroimaging) data modalities. Given an observed linear mixture, the algorithms can recover a causal variable that is a linear effect of another given variable. That is, MERLiN allows us to recover a cortical signal that is affected by activity in a certain brain region, while not being a direct effect of the stimulus. The Python/Matlab implementation for all presented algorithms is available on https://github.com/sweichwald/MERLi
    • …
    corecore