Search CORE

250 research outputs found

A maximum-mean-discrepancy goodness-of-fit test for censored data

Author: Fernández Tamara
Gretton Arthur
Publication venue
Publication date: 09/10/2018
Field of study

We introduce a kernel-based goodness-of-fit test for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life-testing. The test statistic is straightforward to compute, as is the test threshold, and we establish consistency under the null. Unlike earlier approaches such as the Log-rank test, we make no assumptions as to how the data distribution might differ from the null, and our test has power against a very rich class of alternatives. In experiments, our test outperforms competing approaches for periodic and Weibull hazard functions (where risks are time dependent), and does not show the failure modes of tests that rely on user-defined features. Moreover, in cases where classical tests are provably most powerful, our test performs almost as well, while being more general

arXiv.org e-Print Archive

UCL Discovery

A Kernel Independence Test for Random Processes

Author: Chwialkowski Kacper
Gretton Arthur
Publication venue
Publication date: 17/06/2014
Field of study

A new non parametric approach to the problem of testing the independence of two random process is developed. The test statistic is the Hilbert Schmidt Independence Criterion (HSIC), which was used previously in testing independence for i.i.d pairs of variables. The asymptotic behaviour of HSIC is established when computed from samples drawn from random processes. It is shown that earlier bootstrap procedures which worked in the i.i.d. case will fail for random processes, and an alternative consistent estimate of the p-values is proposed. Tests on artificial data and real-world Forex data indicate that the new test procedure discovers dependence which is missed by linear approaches, while the earlier bootstrap procedure returns an elevated number of false positives. The code is available online: https://github.com/kacperChwialkowski/HSIC .Comment: In Proceedings of The 31st International Conference on Machine Learnin

arXiv.org e-Print Archive

CiteSeerX

A low variance consistent test of relative dependency

Author: Blaschko Matthew
Bounliphone Wacha
Gretton Arthur
Tenenhaus Arthur
Publication venue
Publication date: 27/05/2015
Field of study

We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Kernel Test for Three-Variable Interactions

Author: Bergsma Wicher
Gretton Arthur
Sejdinovic Dino
Publication venue
Publication date: 01/01/2013
Field of study

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures

arXiv.org e-Print Archive

CiteSeerX

LSE Research Online

Oxford University Research Archive

B-tests: Low Variance Kernel Two-Sample Tests

Author: Blaschko Matthew
Gretton Arthur
Zaremba Wojciech
Publication venue
Publication date: 01/01/2013
Field of study

A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the

B

-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice.Comment: Neural Information Processing Systems (2013

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

UCL Discovery

Kernel Bayes' rule

Author: Fukumizu Kenji
Gretton Arthur
Song Le
Publication venue
Publication date: 01/01/2011
Field of study

A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces. Probabilities are uniquely characterized by the mean of the canonical map to the RKHS. The prior and conditional probabilities are expressed in terms of RKHS functions of an empirical sample: no explicit parametric model is needed for these quantities. The posterior is likewise an RKHS mean of a weighted sample. The estimator for the expectation of a function of the posterior is derived, and rates of consistency are shown. Some representative applications of the kernel Bayes' rule are presented, including Baysian computation without likelihood and filtering with a nonparametric state-space model.Comment: 27 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

MPG.PuRe

Discussion of: Brownian distance covariance

Author: Fukumizu Kenji
Gretton Arthur
Sriperumbudur Bharath K.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/10/2010
Field of study

Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely and Maria L. Rizzo [arXiv:1010.0297]Comment: Published in at http://dx.doi.org/10.1214/09-AOAS312E the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref