3 research outputs found
A maximum-mean-discrepancy goodness-of-fit test for censored data
We introduce a kernel-based goodness-of-fit test for censored data, where
observations may be missing in random time intervals: a common occurrence in
clinical trials and industrial life-testing. The test statistic is
straightforward to compute, as is the test threshold, and we establish
consistency under the null. Unlike earlier approaches such as the Log-rank
test, we make no assumptions as to how the data distribution might differ from
the null, and our test has power against a very rich class of alternatives. In
experiments, our test outperforms competing approaches for periodic and Weibull
hazard functions (where risks are time dependent), and does not show the
failure modes of tests that rely on user-defined features. Moreover, in cases
where classical tests are provably most powerful, our test performs almost as
well, while being more general
Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data
Survival Analysis and Reliability Theory are concerned with the analysis of
time-to-event data, in which observations correspond to waiting times until an
event of interest such as death from a particular disease or failure of a
component in a mechanical system. This type of data is unique due to the
presence of censoring, a type of missing data that occurs when we do not
observe the actual time of the event of interest but, instead, we have access
to an approximation for it given by random interval in which the observation is
known to belong. Most traditional methods are not designed to deal with
censoring, and thus we need to adapt them to censored time-to-event data. In
this paper, we focus on non-parametric goodness-of-fit testing procedures based
on combining the Stein's method and kernelized discrepancies. While for
uncensored data, there is a natural way of implementing a kernelized Stein
discrepancy test, for censored data there are several options, each of them
with different advantages and disadvantages. In this paper, we propose a
collection of kernelized Stein discrepancy tests for time-to-event data, and we
study each of them theoretically and empirically; our experimental results show
that our proposed methods perform better than existing tests, including
previous tests based on a kernelized maximum mean discrepancy.Comment: Proceedings of the International Conference on Machine Learning, 202
Advances in Non-parametric Hypothesis Testing with Kernels
Non-parametric statistical hypothesis testing procedures aim to distinguish the null hypothesis against the alternative with minimal assumptions on the model distributions. In recent years, the maximum mean discrepancy (MMD) has been developed as a measure to compare two distributions, which is applicable to two-sample problems and independence tests. With the aid of reproducing kernel Hilbert spaces (RKHS) that are rich-enough, MMD enjoys desirable statistical properties including characteristics, consistency, and maximal test power. Moreover, MMD receives empirical successes in complex tasks such as training and comparing generative models. Stein’s method also provides an elegant probabilistic tool to compare unnormalised distributions, which commonly appear in practical machine learning tasks. Combined with rich-enough RKHS, the kernel Stein discrepancy (KSD) has been developed as a proper discrepancy measure between distributions, which can be used to tackle one-sample problems (or goodness-of-fit tests). The existing development of KSD applies to a limited choice of domains, such as Euclidean space or finite discrete sets, and requires complete data observations, while the current MMD constructions are limited by the choice of simple kernels where the power of the tests suffer, e.g. high-dimensional image data. The main focus of this thesis is on the further advancement of kernel-based statistics for hypothesis testings. Firstly, Stein operators are developed that are compatible with broader data domains to perform the corresponding goodness-of-fit tests. Goodness-of-fit tests for general unnormalised densities on Riemannian manifolds, which are of the non-Euclidean topology, have been developed. In addition, novel non-parametric goodness-of-fit tests for data with censoring are studied. Then the tests for data observations with left truncation are studied, e.g. times of entering the hospital always happen before death time in the hospital, and we say the death time is truncated by the entering time. We test the notion of independence beyond truncation by proposing a kernelised measure for quasi-independence. Finally, we study the deep kernel architectures to improve the two-sample testing performances