8 research outputs found
Constraining Variational Inference with Geometric Jensen-Shannon Divergence.
We examine the problem of controlling divergences for latent space
regularisation in variational autoencoders. Specifically, when aiming to
reconstruct example via latent space
(), while balancing this against the need for generalisable latent
representations. We present a regularisation mechanism based on the
skew-geometric Jensen-Shannon divergence
. We find a variation in
, motivated by limiting cases, which leads
to an intuitive interpolation between forward and reverse KL in the space of
both distributions and divergences. We motivate its potential benefits for VAEs
through low-dimensional examples, before presenting quantitative and
qualitative results. Our experiments demonstrate that skewing our variant of
, in the context of
-VAEs, leads to better reconstruction and
generation when compared to several baseline VAEs. Our approach is entirely
unsupervised and utilises only one hyperparameter which can be easily
interpreted in latent space.Comment: Camera-ready version, accepted at NeurIPS 202
A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks
Kullback-Leibler (KL) divergence is widely used for variational inference of
Bayesian Neural Networks (BNNs). However, the KL divergence has limitations
such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS)
divergence that is more general, bounded, and symmetric. We formulate a novel
loss function for BNNs based on the geometric JS divergence and show that the
conventional KL divergence-based loss function is its special case. We evaluate
the divergence part of the proposed loss function in a closed form for a
Gaussian prior. For any other general prior, Monte Carlo approximations can be
used. We provide algorithms for implementing both of these cases. We
demonstrate that the proposed loss function offers an additional parameter that
can be tuned to control the degree of regularisation. We derive the conditions
under which the proposed loss function regularises better than the KL
divergence-based loss function for Gaussian priors and posteriors. We
demonstrate performance improvements over the state-of-the-art KL
divergence-based BNN on the classification of a noisy CIFAR data set and a
biased histopathology data set.Comment: To be submitted for peer review in IEE
Quantile Propagation for Wasserstein-Approximate Gaussian Processes
We develop a new approximate Bayesian inference method for Gaussian process models with factorized non-Gaussian likelihoods. Our method---dubbed Quantile Propagation (QP)---is similar to expectation propagation (EP) but minimizes the L_2 Wasserstein distance rather than the Kullback-Leibler (KL) divergence. We consider the case where likelihood factors are approximated by a Gaussian form. We show that QP matches quantile functions rather than moments as in EP and has the same mean update but a smaller variance update than EP, thereby alleviating the over-estimation of the posterior variance exhibited by EP. Crucially, QP has the same favorable locality property as EP, and thereby admits an efficient algorithm. Experiments on classification and Poisson regression tasks demonstrate that QP outperforms both EP and variational Bayes
Bringing Models to the Domain: Deploying Gaussian Processes in the Biological Sciences
Recent developments in single cell sequencing allow us to elucidate
processes of individual cells in unprecedented detail. This detail
provides new insights into the progress of cells during cell type
differentiation. Cell type heterogeneity shows the complexity of cells
working together to produce organ function on a macro level. The
understanding of single cell transcriptomics promises to lead to the
ultimate goal of understanding the function of individual cells and
their contribution to higher level function in their environment.
Characterizing the transcriptome of single cells requires us to
understand and be able to model the latent processes of cell functions
that explain biological variance and richness of gene expression
measurements. In this thesis, we describe ways of jointly modelling
biological function and unwanted technical and biological confounding
variation using Gaussian process latent variable models. In addition
to mathematical modelling of latent processes, we provide insights
into the understanding of research code and the significance of
computer science in development of techniques for single cell
experiments.
We will describe the process of understanding complex
machine learning algorithms and translating them into usable
software. We then proceed to applying these algorithms. We show how
proper research software design underlying the implementation can lead
to a large user base in other areas of expertise, such as single cell gene
expression. To show the worth of properly designed software underlying
a research project, we show other software packages built upon the
software developed during this thesis and how they can be applied to
single cell gene expression experiments.
Understanding the underlying function of cells seems within reach
through these new techniques that allow us to unravel the
transcriptome of single cells. We describe probabilistic techniques of
identifying the latent functions of cells, while focusing on the
software and ease-of-use aspects of supplying proper research code to
be applied by other researchers
Approximate Inference for Non-parametric Bayesian Hawkes Processes and Beyond
The Hawkes process has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernels, the non-parametric Hawkes process has been proposed, in which the triggering kernel is in a non-parametric form. However, inference in such models suffers from poor scalability to large-scale datasets and sensitivity to uncertainty in the random finite samples. To deal with these issues, we employ Bayesian non-parametric Hawkes processes and propose two kinds of efficient approximate inference methods based on existing inference techniques. Although having worked as the cornerstone of probabilistic methods based on Gaussian process priors, most of existing inference techniques approximately optimize standard divergence measures such as the Kullback-Leibler (KL) divergence, which lacks the basic desiderata for the task at hand, while chiefly offering merely technical convenience. In order to improve them, we further propose a more advanced Bayesian inference approach based on the Wasserstein distance, which is applicable to a wide range of models. Apart from these works, we also explore a robust frequentist estimation method beyond the Bayesian field. Efficient inference techniques for the Hawkes process will help all the different applications that it already has, from earthquake forecasting, finance to social media. Furthermore, approximate inference techniques proposed in this thesis have the potential to be applied to other models to improve robustness and account for uncertainty
Gaussian Process Based Approaches for Survival Analysis
Traditional machine learning focuses on the situation where a fixed number of features are available for each data-point. For medical applications each individual patient will typically have a different set of clinical tests associated with them. This results in a varying number of observed per patient features. An important indicator of interest in medical domains is survival information. Survival data presents its own particular challenges such as censoring. The aim of this thesis is to explore how machine learning ideas can be transferred to the domain of clinical data analysis. We consider two primary challenges; firstly how survival models can be made more flexible through non-linearisation and secondly methods for missing data imputation in order to handle the varying number of observed per patient features. We use the framework of Gaussian process modelling to facilitate conflation of our approaches; allowing the dual challenges of survival data and missing data to be addressed. The results show promise, although challenges remain. In particular when a large proportion of data is missing, greater uncertainty in inferences results. Principled handling of this uncertainty requires propagation through any Gaussian process model used for subsequent regression
Tilted variational bayes
We present a novel method for approximate inference. Using some of the constructs from expectation propagation (EP), we derive a lower bound of the marginal likelihood in a similar fashion to variational Bayes (VB). The method combines some of the benefits of VB and EP: it can be used with light-tailed likelihoods (where traditional VB fails), and it provides a lower bound on the marginal likelihood. We apply the method to Gaussian process classification, a situation where the Kullback-Leibler divergence minimized in traditional VB can be infinite, and to robust Gaussian process regression, where the inference process is dramatically simplified in comparison to EP. Code to reproduce all the experiments can be found at github.com/SheffieldML/TVB