18 research outputs found

    A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

    Full text link
    Kullback-Leibler (KL) divergence is widely used for variational inference of Bayesian Neural Networks (BNNs). However, the KL divergence has limitations such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS) divergence that is more general, bounded, and symmetric. We formulate a novel loss function for BNNs based on the geometric JS divergence and show that the conventional KL divergence-based loss function is its special case. We evaluate the divergence part of the proposed loss function in a closed form for a Gaussian prior. For any other general prior, Monte Carlo approximations can be used. We provide algorithms for implementing both of these cases. We demonstrate that the proposed loss function offers an additional parameter that can be tuned to control the degree of regularisation. We derive the conditions under which the proposed loss function regularises better than the KL divergence-based loss function for Gaussian priors and posteriors. We demonstrate performance improvements over the state-of-the-art KL divergence-based BNN on the classification of a noisy CIFAR data set and a biased histopathology data set.Comment: To be submitted for peer review in IEE

    On a generalization of the Jensen-Shannon divergence

    Full text link
    The Jensen-Shannon divergence is a renown bounded symmetrization of the Kullback-Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar α\alpha-Jensen-Bregman divergences and derive thereof the vector-skew α\alpha-Jensen-Shannon divergences. We study the properties of these novel divergences and show how to build parametric families of symmetric Jensen-Shannon-type divergences. Finally, we report an iterative algorithm to numerically compute the Jensen-Shannon-type centroids for a set of probability densities belonging to a mixture family: This includes the case of the Jensen-Shannon centroid of a set of categorical distributions or normalized histograms.Comment: 19 pages, 3 figure

    Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry

    Full text link
    We generalize quasi-arithmetic means beyond scalars by considering the gradient map of a Legendre type real-valued function. The gradient map of a Legendre type function is proven strictly comonotone with a global inverse. It thus yields a generalization of strictly mononotone and differentiable functions generating scalar quasi-arithmetic means. Furthermore, the Legendre transformation gives rise to pairs of dual quasi-arithmetic averages via the convex duality. We study the invariance and equivariance properties under affine transformations of quasi-arithmetic averages via the lens of dually flat spaces of information geometry. We show how these quasi-arithmetic averages are used to express points on dual geodesics and sided barycenters in the dual affine coordinate systems. We then consider quasi-arithmetic mixtures and describe several parametric and non-parametric statistical models which are closed under the quasi-arithmetic mixture operation.Comment: 20 page

    Generalized Multimodal ELBO

    Full text link
    Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.Comment: 2021 ICL

    Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence

    Full text link
    Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.Comment: Accepted at NeurIPS 2020, camera-ready versio

    Simulation of complex dynamics of mean-field pp-spin models using measurement-based quantum feedback control

    Full text link
    We study the application of a new method for simulating nonlinear dynamics of many-body spin systems using quantum measurement and feedback [Mu\~noz-Arias et al., Phys. Rev. Lett. 124, 110503 (2020)] to a broad class of many-body models known as pp-spin Hamiltonians, which describe Ising-like models on a completely connected graph with pp-body interactions. The method simulates the desired mean field dynamics in the thermodynamic limit by combining nonprojective measurements of a component of the collective spin with a global rotation conditioned on the measurement outcome. We apply this protocol to simulate the dynamics of the pp-spin Hamiltonians and demonstrate how different aspects of criticality in the mean-field regime are readily accessible with our protocol. We study applications including properties of dynamical phase transitions and the emergence of spontaneous symmetry breaking in the adiabatic dynamics of the collective spin for different values of the parameter pp. We also demonstrate how this method can be employed to study the quantum-to-classical transition in the dynamics continuously as a function of system size.Comment: 16 pages, 7 figure