18 research outputs found
A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks
Kullback-Leibler (KL) divergence is widely used for variational inference of
Bayesian Neural Networks (BNNs). However, the KL divergence has limitations
such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS)
divergence that is more general, bounded, and symmetric. We formulate a novel
loss function for BNNs based on the geometric JS divergence and show that the
conventional KL divergence-based loss function is its special case. We evaluate
the divergence part of the proposed loss function in a closed form for a
Gaussian prior. For any other general prior, Monte Carlo approximations can be
used. We provide algorithms for implementing both of these cases. We
demonstrate that the proposed loss function offers an additional parameter that
can be tuned to control the degree of regularisation. We derive the conditions
under which the proposed loss function regularises better than the KL
divergence-based loss function for Gaussian priors and posteriors. We
demonstrate performance improvements over the state-of-the-art KL
divergence-based BNN on the classification of a noisy CIFAR data set and a
biased histopathology data set.Comment: To be submitted for peer review in IEE
On a generalization of the Jensen-Shannon divergence
The Jensen-Shannon divergence is a renown bounded symmetrization of the
Kullback-Leibler divergence which does not require probability densities to
have matching supports. In this paper, we introduce a vector-skew
generalization of the scalar -Jensen-Bregman divergences and derive
thereof the vector-skew -Jensen-Shannon divergences. We study the
properties of these novel divergences and show how to build parametric families
of symmetric Jensen-Shannon-type divergences. Finally, we report an iterative
algorithm to numerically compute the Jensen-Shannon-type centroids for a set of
probability densities belonging to a mixture family: This includes the case of
the Jensen-Shannon centroid of a set of categorical distributions or normalized
histograms.Comment: 19 pages, 3 figure
Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry
We generalize quasi-arithmetic means beyond scalars by considering the
gradient map of a Legendre type real-valued function. The gradient map of a
Legendre type function is proven strictly comonotone with a global inverse. It
thus yields a generalization of strictly mononotone and differentiable
functions generating scalar quasi-arithmetic means. Furthermore, the Legendre
transformation gives rise to pairs of dual quasi-arithmetic averages via the
convex duality. We study the invariance and equivariance properties under
affine transformations of quasi-arithmetic averages via the lens of dually flat
spaces of information geometry. We show how these quasi-arithmetic averages are
used to express points on dual geodesics and sided barycenters in the dual
affine coordinate systems. We then consider quasi-arithmetic mixtures and
describe several parametric and non-parametric statistical models which are
closed under the quasi-arithmetic mixture operation.Comment: 20 page
Generalized Multimodal ELBO
Multiple data types naturally co-occur when describing real-world phenomena
and learning from them is a long-standing goal in machine learning research.
However, existing self-supervised generative models approximating an ELBO are
not able to fulfill all desired requirements of multimodal models: their
posterior approximation functions lead to a trade-off between the semantic
coherence and the ability to learn the joint data distribution. We propose a
new, generalized ELBO formulation for multimodal data that overcomes these
limitations. The new objective encompasses two previous methods as special
cases and combines their benefits without compromises. In extensive
experiments, we demonstrate the advantage of the proposed method compared to
state-of-the-art models in self-supervised, generative learning tasks.Comment: 2021 ICL
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
Learning from different data types is a long-standing goal in machine
learning research, as multiple information sources co-occur when describing
natural phenomena. However, existing generative models that approximate a
multimodal ELBO rely on difficult or inefficient training schemes to learn a
joint distribution and the dependencies between modalities. In this work, we
propose a novel, efficient objective function that utilizes the Jensen-Shannon
divergence for multiple distributions. It simultaneously approximates the
unimodal and joint multimodal posteriors directly via a dynamic prior. In
addition, we theoretically prove that the new multimodal JS-divergence (mmJSD)
objective optimizes an ELBO. In extensive experiments, we demonstrate the
advantage of the proposed mmJSD model compared to previous work in
unsupervised, generative learning tasks.Comment: Accepted at NeurIPS 2020, camera-ready versio
Simulation of complex dynamics of mean-field -spin models using measurement-based quantum feedback control
We study the application of a new method for simulating nonlinear dynamics of
many-body spin systems using quantum measurement and feedback [Mu\~noz-Arias et
al., Phys. Rev. Lett. 124, 110503 (2020)] to a broad class of many-body models
known as -spin Hamiltonians, which describe Ising-like models on a
completely connected graph with -body interactions. The method simulates the
desired mean field dynamics in the thermodynamic limit by combining
nonprojective measurements of a component of the collective spin with a global
rotation conditioned on the measurement outcome. We apply this protocol to
simulate the dynamics of the -spin Hamiltonians and demonstrate how
different aspects of criticality in the mean-field regime are readily
accessible with our protocol. We study applications including properties of
dynamical phase transitions and the emergence of spontaneous symmetry breaking
in the adiabatic dynamics of the collective spin for different values of the
parameter . We also demonstrate how this method can be employed to study the
quantum-to-classical transition in the dynamics continuously as a function of
system size.Comment: 16 pages, 7 figure