43,099 research outputs found
Large-Scale Kernel Methods for Independence Testing
Representations of probability measures in reproducing kernel Hilbert spaces
provide a flexible framework for fully nonparametric hypothesis tests of
independence, which can capture any type of departure from independence,
including nonlinear associations and multivariate interactions. However, these
approaches come with an at least quadratic computational cost in the number of
observations, which can be prohibitive in many applications. Arguably, it is
exactly in such large-scale datasets that capturing any type of dependence is
of interest, so striking a favourable tradeoff between computational efficiency
and test performance for kernel independence tests would have a direct impact
on their applicability in practice. In this contribution, we provide an
extensive study of the use of large-scale kernel approximations in the context
of independence testing, contrasting block-based, Nystrom and random Fourier
feature approaches. Through a variety of synthetic data experiments, it is
demonstrated that our novel large scale methods give comparable performance
with existing methods whilst using significantly less computation time and
memory.Comment: 29 pages, 6 figure
Learning new physics efficiently with nonparametric methods
We present a machine learning approach for model-independent new physics
searches. The corresponding algorithm is powered by recent large-scale
implementations of kernel methods, nonparametric learning algorithms that can
approximate any continuous function given enough data. Based on the original
proposal by D'Agnolo and Wulzer (arXiv:1806.02350), the model evaluates the
compatibility between experimental data and a reference model, by implementing
a hypothesis testing procedure based on the likelihood ratio.
Model-independence is enforced by avoiding any prior assumption about the
presence or shape of new physics components in the measurements. We show that
our approach has dramatic advantages compared to neural network implementations
in terms of training times and computational resources, while maintaining
comparable performances. In particular, we conduct our tests on higher
dimensional datasets, a step forward with respect to previous studies.Comment: 22 pages, 13 figure
Kernel-based Conditional Independence Test and Application in Causal Discovery
Conditional independence testing is an important problem, especially in
Bayesian network learning and causal discovery. Due to the curse of
dimensionality, testing for conditional independence of continuous variables is
particularly challenging. We propose a Kernel-based Conditional Independence
test (KCI-test), by constructing an appropriate test statistic and deriving its
asymptotic distribution under the null hypothesis of conditional independence.
The proposed method is computationally efficient and easy to implement.
Experimental results show that it outperforms other methods, especially when
the conditioning set is large or the sample size is not very large, in which
case other methods encounter difficulties
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Conditional independence testing is a fundamental problem underlying causal
discovery and a particularly challenging task in the presence of nonlinear and
high-dimensional dependencies. Here a fully non-parametric test for continuous
data based on conditional mutual information combined with a local permutation
scheme is presented. Through a nearest neighbor approach, the test efficiently
adapts also to non-smooth distributions due to strongly nonlinear dependencies.
Numerical experiments demonstrate that the test reliably simulates the null
distribution even for small sample sizes and with high-dimensional conditioning
sets. The test is better calibrated than kernel-based tests utilizing an
analytical approximation of the null distribution, especially for non-smooth
densities, and reaches the same or higher power levels. Combining the local
permutation scheme with the kernel tests leads to better calibration, but
suffers in power. For smaller sample sizes and lower dimensions, the test is
faster than random fourier feature-based kernel tests if the permutation scheme
is (embarrassingly) parallelized, but the runtime increases more sharply with
sample size and dimensionality. Thus, more theoretical research to analytically
approximate the null distribution and speed up the estimation for larger sample
sizes is desirable.Comment: 17 pages, 12 figures, 1 tabl
- …