197 research outputs found
Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo
Intractable generative models are models for which the likelihood is
unavailable but sampling is possible. Most approaches to parameter inference in
this setting require the computation of some discrepancy between the data and
the generative model. This is for example the case for minimum distance
estimation and approximate Bayesian computation. These approaches require
sampling a high number of realisations from the model for different parameter
values, which can be a significant challenge when simulating is an expensive
operation. In this paper, we propose to enhance this approach by enforcing
"sample diversity" in simulations of our models. This will be implemented
through the use of quasi-Monte Carlo (QMC) point sets. Our key results are
sample complexity bounds which demonstrate that, under smoothness conditions on
the generator, QMC can significantly reduce the number of samples required to
obtain a given level of accuracy when using three of the most common
discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the
Sinkhorn divergence. This is complemented by a simulation study which
highlights that an improved accuracy is sometimes also possible in some
settings which are not covered by the theory.Comment: minor presentation changes and updated reference
BoMb-OT: On Batch of Mini-batches Optimal Transport
Mini-batch optimal transport (m-OT) has been successfully used in practical
applications that involve probability measures with intractable density, or
probability measures with a very high number of supports. The m-OT solves
several sparser optimal transport problems and then returns the average of
their costs and transportation plans. Despite its scalability advantage, the
m-OT does not consider the relationship between mini-batches which leads to
undesirable estimation. Moreover, the m-OT does not approximate a proper metric
between probability measures since the identity property is not satisfied. To
address these problems, we propose a novel mini-batching scheme for optimal
transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that finds
the optimal coupling between mini-batches and it can be seen as an
approximation to a well-defined distance on the space of probability measures.
Furthermore, we show that the m-OT is a limit of the entropic regularized
version of the BoMb-OT when the regularized parameter goes to infinity.
Finally, we carry out extensive experiments to show that the BoMb-OT can
estimate a better transportation plan between two original measures than the
m-OT. It leads to a favorable performance of the BoMb-OT in the matching and
color transfer tasks. Furthermore, we observe that the BoMb-OT also provides a
better objective loss than the m-OT for doing approximate Bayesian computation,
estimating parameters of interest in parametric generative models, and learning
non-parametric generative models with gradient flow.Comment: 36 pages, 20 figure
Hierarchical Sliced Wasserstein Distance
Sliced Wasserstein (SW) distance has been widely used in different
application scenarios since it can be scaled to a large number of supports
without suffering from the curse of dimensionality. The value of sliced
Wasserstein distance is the average of transportation cost between
one-dimensional representations (projections) of original measures that are
obtained by Radon Transform (RT). Despite its efficiency in the number of
supports, estimating the sliced Wasserstein requires a relatively large number
of projections in high-dimensional settings. Therefore, for applications where
the number of supports is relatively small compared with the dimension, e.g.,
several deep learning applications where the mini-batch approaches are
utilized, the complexities from matrix multiplication of Radon Transform become
the main computational bottleneck. To address this issue, we propose to derive
projections by linearly and randomly combining a smaller number of projections
which are named bottleneck projections. We explain the usage of these
projections by introducing Hierarchical Radon Transform (HRT) which is
constructed by applying Radon Transform variants recursively. We then formulate
the approach into a new metric between measures, named Hierarchical Sliced
Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the
metricity of HSW. Moreover, we investigate the theoretical properties of HSW
including its connection to SW variants and its computational and sample
complexities. Finally, we compare the computational cost and generative quality
of HSW with the conventional SW on the task of deep generative modeling using
various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.Comment: 28 pages, 7 figures, 3 table
Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability
Reinforcement Learning or optimal control can provide effective reasoning for
sequential decision-making problems with variable dynamics. Such reasoning in
practical implementation, however, poses a persistent challenge in interpreting
the reward function and corresponding optimal policy. Consequently, formalizing
the sequential decision-making problems as inference has a considerable value,
as probabilistic inference in principle offers diverse and powerful
mathematical tools to infer the stochastic dynamics whilst suggesting a
probabilistic interpretation of the reward design and policy convergence. In
this study, we propose a novel Adaptive Wasserstein Variational Optimization
(AWaVO) to tackle these challenges in sequential decision-making. Our approach
utilizes formal methods to provide interpretations of reward design,
transparency of training convergence, and probabilistic interpretation of
sequential decisions. To demonstrate practicality, we show convergent training
with guaranteed global convergence rates not only in simulation but also in
real robot tasks, and empirically verify a reasonable tradeoff between high
performance and conservative interpretability.Comment: 24 pages, 8 figures, containing Appendi
Generative Sliced MMD Flows with Riesz Kernels
Maximum mean discrepancy (MMD) flows suffer from high computational costs in
large scale computations. In this paper, we show that MMD flows with Riesz
kernels , have exceptional properties which
allow for their efficient computation. First, the MMD of Riesz kernels
coincides with the MMD of their sliced version. As a consequence, the
computation of gradients of MMDs can be performed in the one-dimensional
setting. Here, for , a simple sorting algorithm can be applied to reduce
the complexity from to for two empirical
measures with and support points. For the implementations we
approximate the gradient of the sliced MMD by using only a finite number of
slices. We show that the resulting error has complexity , where
is the data dimension. These results enable us to train generative models
by approximating MMD gradient flows by neural networks even for large scale
applications. We demonstrate the efficiency of our model by image generation on
MNIST, FashionMNIST and CIFAR10
- …