155 research outputs found
Large-Scale Kernel Methods for Independence Testing
Representations of probability measures in reproducing kernel Hilbert spaces
provide a flexible framework for fully nonparametric hypothesis tests of
independence, which can capture any type of departure from independence,
including nonlinear associations and multivariate interactions. However, these
approaches come with an at least quadratic computational cost in the number of
observations, which can be prohibitive in many applications. Arguably, it is
exactly in such large-scale datasets that capturing any type of dependence is
of interest, so striking a favourable tradeoff between computational efficiency
and test performance for kernel independence tests would have a direct impact
on their applicability in practice. In this contribution, we provide an
extensive study of the use of large-scale kernel approximations in the context
of independence testing, contrasting block-based, Nystrom and random Fourier
feature approaches. Through a variety of synthetic data experiments, it is
demonstrated that our novel large scale methods give comparable performance
with existing methods whilst using significantly less computation time and
memory.Comment: 29 pages, 6 figure
Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures
In this article we propose novel Bayesian nonparametric methods using
Dirichlet Process Mixture (DPM) models for detecting pairwise dependence
between random variables while accounting for uncertainty in the form of the
underlying distributions. A key criteria is that the procedures should scale to
large data sets. In this regard we find that the formal calculation of the
Bayes factor for a dependent-vs.-independent DPM joint probability measure is
not feasible computationally. To address this we present Bayesian diagnostic
measures for characterising evidence against a "null model" of pairwise
independence. In simulation studies, as well as for a real data analysis, we
show that our approach provides a useful tool for the exploratory nonparametric
Bayesian analysis of large multivariate data sets
Considerate Approaches to Achieving Sufficiency for ABC model selection
For nearly any challenging scientific problem evaluation of the likelihood is
problematic if not impossible. Approximate Bayesian computation (ABC) allows us
to employ the whole Bayesian formalism to problems where we can use simulations
from a model, but cannot evaluate the likelihood directly. When summary
statistics of real and simulated data are compared --- rather than the data
directly --- information is lost, unless the summary statistics are sufficient.
Here we employ an information-theoretical framework that can be used to
construct (approximately) sufficient statistics by combining different
statistics until the loss of information is minimized. Such sufficient sets of
statistics are constructed for both parameter estimation and model selection
problems. We apply our approach to a range of illustrative and real-world model
selection problems
Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels
Bacterial community composition is measured using 16S rRNA (ribosomal
ribonucleic acid) gene sequencing, for which one of the defining
characteristics is the phylogenetic relationships that exist between variables.
Here, we demonstrate the utility of modelling these relationships in two
statistical tasks (the two sample test and host trait prediction) by employing
string kernels originally proposed in natural language processing. We show via
simulation studies that a kernel two-sample test using the proposed kernels,
which explicitly model phylogenetic relationships, is powerful while also being
sensitive to the phylogenetic scale of the difference between the two
populations. We also demonstrate how the proposed kernels can be used with
Gaussian processes to improve predictive performance in host trait prediction.
Our method is implemented in the Python package StringPhylo (available at
github.com/jonathanishhorowicz/stringphylo)
Delayed Feedback in Generalised Linear Bandits Revisited
The stochastic generalised linear bandit is a well-understood model for
sequential decision-making problems, with many algorithms achieving
near-optimal regret guarantees under immediate feedback. However, in many real
world settings, the requirement that the reward is observed immediately is not
applicable. In this setting, standard algorithms are no longer theoretically
understood. We study the phenomenon of delayed rewards in a theoretical manner
by introducing a delay between selecting an action and receiving the reward.
Subsequently, we show that an algorithm based on the optimistic principle
improves on existing approaches for this setting by eliminating the need for
prior knowledge of the delay distribution and relaxing assumptions on the
decision set and the delays. This also leads to improving the regret guarantees
from to , where denotes the
expected delay, is the dimension and the time horizon and we have
suppressed logarithmic terms. We verify our theoretical results through
experiments on simulated data
Group Spike and Slab Variational Bayes
We introduce Group Spike-and-slab Variational Bayes (GSVB), a scalable method
for group sparse regression. A fast co-ordinate ascent variational inference
(CAVI) algorithm is developed for several common model families including
Gaussian, Binomial and Poisson. Theoretical guarantees for our proposed
approach are provided by deriving contraction rates for the variational
posterior in grouped linear regression. Through extensive numerical studies, we
demonstrate that GSVB provides state-of-the-art performance, offering a
computationally inexpensive substitute to MCMC, whilst performing comparably or
better than existing MAP methods. Additionally, we analyze three real world
datasets wherein we highlight the practical utility of our method,
demonstrating that GSVB provides parsimonious models with excellent predictive
performance, variable selection and uncertainty quantification.Comment: 66 pages, 5 figures, 7 table
- …