8 research outputs found

    Constraining Variational Inference with Geometric Jensen-Shannon Divergence.

    Get PDF
    We examine the problem of controlling divergences for latent space regularisation in variational autoencoders. Specifically, when aiming to reconstruct example x∈Rmx\in\mathbb{R}^{m} via latent space z∈Rnz\in\mathbb{R}^{n} (n≤mn\leq m), while balancing this against the need for generalisable latent representations. We present a regularisation mechanism based on the skew-geometric Jensen-Shannon divergence (JSGα)\left(\textrm{JS}^{\textrm{G}_{\alpha}}\right). We find a variation in JSGα\textrm{JS}^{\textrm{G}_{\alpha}}, motivated by limiting cases, which leads to an intuitive interpolation between forward and reverse KL in the space of both distributions and divergences. We motivate its potential benefits for VAEs through low-dimensional examples, before presenting quantitative and qualitative results. Our experiments demonstrate that skewing our variant of JSGα\textrm{JS}^{\textrm{G}_{\alpha}}, in the context of JSGα\textrm{JS}^{\textrm{G}_{\alpha}}-VAEs, leads to better reconstruction and generation when compared to several baseline VAEs. Our approach is entirely unsupervised and utilises only one hyperparameter which can be easily interpreted in latent space.Comment: Camera-ready version, accepted at NeurIPS 202

    A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

    Full text link
    Kullback-Leibler (KL) divergence is widely used for variational inference of Bayesian Neural Networks (BNNs). However, the KL divergence has limitations such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS) divergence that is more general, bounded, and symmetric. We formulate a novel loss function for BNNs based on the geometric JS divergence and show that the conventional KL divergence-based loss function is its special case. We evaluate the divergence part of the proposed loss function in a closed form for a Gaussian prior. For any other general prior, Monte Carlo approximations can be used. We provide algorithms for implementing both of these cases. We demonstrate that the proposed loss function offers an additional parameter that can be tuned to control the degree of regularisation. We derive the conditions under which the proposed loss function regularises better than the KL divergence-based loss function for Gaussian priors and posteriors. We demonstrate performance improvements over the state-of-the-art KL divergence-based BNN on the classification of a noisy CIFAR data set and a biased histopathology data set.Comment: To be submitted for peer review in IEE

    Quantile Propagation for Wasserstein-Approximate Gaussian Processes

    Full text link
    We develop a new approximate Bayesian inference method for Gaussian process models with factorized non-Gaussian likelihoods. Our method---dubbed Quantile Propagation (QP)---is similar to expectation propagation (EP) but minimizes the L_2 Wasserstein distance rather than the Kullback-Leibler (KL) divergence. We consider the case where likelihood factors are approximated by a Gaussian form. We show that QP matches quantile functions rather than moments as in EP and has the same mean update but a smaller variance update than EP, thereby alleviating the over-estimation of the posterior variance exhibited by EP. Crucially, QP has the same favorable locality property as EP, and thereby admits an efficient algorithm. Experiments on classification and Poisson regression tasks demonstrate that QP outperforms both EP and variational Bayes

    Bringing Models to the Domain: Deploying Gaussian Processes in the Biological Sciences

    Get PDF
    Recent developments in single cell sequencing allow us to elucidate processes of individual cells in unprecedented detail. This detail provides new insights into the progress of cells during cell type differentiation. Cell type heterogeneity shows the complexity of cells working together to produce organ function on a macro level. The understanding of single cell transcriptomics promises to lead to the ultimate goal of understanding the function of individual cells and their contribution to higher level function in their environment. Characterizing the transcriptome of single cells requires us to understand and be able to model the latent processes of cell functions that explain biological variance and richness of gene expression measurements. In this thesis, we describe ways of jointly modelling biological function and unwanted technical and biological confounding variation using Gaussian process latent variable models. In addition to mathematical modelling of latent processes, we provide insights into the understanding of research code and the significance of computer science in development of techniques for single cell experiments. We will describe the process of understanding complex machine learning algorithms and translating them into usable software. We then proceed to applying these algorithms. We show how proper research software design underlying the implementation can lead to a large user base in other areas of expertise, such as single cell gene expression. To show the worth of properly designed software underlying a research project, we show other software packages built upon the software developed during this thesis and how they can be applied to single cell gene expression experiments. Understanding the underlying function of cells seems within reach through these new techniques that allow us to unravel the transcriptome of single cells. We describe probabilistic techniques of identifying the latent functions of cells, while focusing on the software and ease-of-use aspects of supplying proper research code to be applied by other researchers

    Approximate Inference for Non-parametric Bayesian Hawkes Processes and Beyond

    Get PDF
    The Hawkes process has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernels, the non-parametric Hawkes process has been proposed, in which the triggering kernel is in a non-parametric form. However, inference in such models suffers from poor scalability to large-scale datasets and sensitivity to uncertainty in the random finite samples. To deal with these issues, we employ Bayesian non-parametric Hawkes processes and propose two kinds of efficient approximate inference methods based on existing inference techniques. Although having worked as the cornerstone of probabilistic methods based on Gaussian process priors, most of existing inference techniques approximately optimize standard divergence measures such as the Kullback-Leibler (KL) divergence, which lacks the basic desiderata for the task at hand, while chiefly offering merely technical convenience. In order to improve them, we further propose a more advanced Bayesian inference approach based on the Wasserstein distance, which is applicable to a wide range of models. Apart from these works, we also explore a robust frequentist estimation method beyond the Bayesian field. Efficient inference techniques for the Hawkes process will help all the different applications that it already has, from earthquake forecasting, finance to social media. Furthermore, approximate inference techniques proposed in this thesis have the potential to be applied to other models to improve robustness and account for uncertainty

    Gaussian Process Based Approaches for Survival Analysis

    Get PDF
    Traditional machine learning focuses on the situation where a fixed number of features are available for each data-point. For medical applications each individual patient will typically have a different set of clinical tests associated with them. This results in a varying number of observed per patient features. An important indicator of interest in medical domains is survival information. Survival data presents its own particular challenges such as censoring. The aim of this thesis is to explore how machine learning ideas can be transferred to the domain of clinical data analysis. We consider two primary challenges; firstly how survival models can be made more flexible through non-linearisation and secondly methods for missing data imputation in order to handle the varying number of observed per patient features. We use the framework of Gaussian process modelling to facilitate conflation of our approaches; allowing the dual challenges of survival data and missing data to be addressed. The results show promise, although challenges remain. In particular when a large proportion of data is missing, greater uncertainty in inferences results. Principled handling of this uncertainty requires propagation through any Gaussian process model used for subsequent regression

    Tilted variational bayes

    No full text
    We present a novel method for approximate inference. Using some of the constructs from expectation propagation (EP), we derive a lower bound of the marginal likelihood in a similar fashion to variational Bayes (VB). The method combines some of the benefits of VB and EP: it can be used with light-tailed likelihoods (where traditional VB fails), and it provides a lower bound on the marginal likelihood. We apply the method to Gaussian process classification, a situation where the Kullback-Leibler divergence minimized in traditional VB can be infinite, and to robust Gaussian process regression, where the inference process is dramatically simplified in comparison to EP. Code to reproduce all the experiments can be found at github.com/SheffieldML/TVB
    corecore