142 research outputs found

    Large-Scale Kernel Methods for Independence Testing

    Get PDF
    Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.Comment: 29 pages, 6 figure

    Method for Enabling Causal Inference in Relational Domains

    Get PDF
    The analysis of data from complex systems is quickly becoming a fundamental aspect of modern business, government, and science. The field of causal learning is concerned with developing a set of statistical methods that allow practitioners make inferences about unseen interventions. This field has seen significant advances in recent years. However, the vast majority of this work assumes that data instances are independent, whereas many systems are best described in terms of interconnected instances, i.e. relational systems. This discrepancy prevents causal inference techniques from being reliably applied in many real-world settings. In this thesis, I will present three contributions to the field of causal inference that seek to enable the analysis of relational systems. First, I will present theory for consistently testing statistical dependence in relational domains. I then show how the significance of this test can be measured in practice using a novel bootstrap method for structured domains. Second, I show that statistical dependence in relational domains is inherently asymmetric, implying a simple test of causal direction from observational data. This test requires no assumptions on either the marginal distributions of variables or the functional form of dependence. Third, I describe relational causal adjustment, a procedure to identify the effects of arbitrary interventions from observational relational data via an extension of Pearl\u27s backdoor criterion. A series of evaluations on synthetic domains shows the estimates obtained by relational causal adjustment are close to those obtained from explicit experimentation

    Feature-to-feature regression for a two-step conditional independence test

    No full text
    The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data
    • …
    corecore