410 research outputs found
Generative modeling using the sliced Wasserstein distance
Generative adversarial nets (GANs) are very successful at modeling distributions from given samples, even in the high-dimensional case. However, their formulation is also known to be hard to optimize and often unstable. While the aforementioned problems are particularly true for early GAN formulations, there has been significant empirically motivated and theoretically founded progress to improve stability, for instance, by using the Wasserstein distance rather than the Jenson-Shannon divergence. Here, we consider an alternative formulation for generative modeling based on random projections which, in its simplest form, results in a single objective rather than a saddlepoint formulation. By augmenting this approach with a discriminator we improve its accuracy. We found our approach to be significantly more stable compared to even the improved Wasserstein GAN. Further, unlike the traditional GAN loss, the loss formulated in our method is a good measure of the actual distance between the distributions and, for the first time for GAN training, we are able to show estimates for the same
Hierarchical Sliced Wasserstein Distance
Sliced Wasserstein (SW) distance has been widely used in different
application scenarios since it can be scaled to a large number of supports
without suffering from the curse of dimensionality. The value of sliced
Wasserstein distance is the average of transportation cost between
one-dimensional representations (projections) of original measures that are
obtained by Radon Transform (RT). Despite its efficiency in the number of
supports, estimating the sliced Wasserstein requires a relatively large number
of projections in high-dimensional settings. Therefore, for applications where
the number of supports is relatively small compared with the dimension, e.g.,
several deep learning applications where the mini-batch approaches are
utilized, the complexities from matrix multiplication of Radon Transform become
the main computational bottleneck. To address this issue, we propose to derive
projections by linearly and randomly combining a smaller number of projections
which are named bottleneck projections. We explain the usage of these
projections by introducing Hierarchical Radon Transform (HRT) which is
constructed by applying Radon Transform variants recursively. We then formulate
the approach into a new metric between measures, named Hierarchical Sliced
Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the
metricity of HSW. Moreover, we investigate the theoretical properties of HSW
including its connection to SW variants and its computational and sample
complexities. Finally, we compare the computational cost and generative quality
of HSW with the conventional SW on the task of deep generative modeling using
various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.Comment: 28 pages, 7 figures, 3 table
Sliced Wasserstein Generative Models
In generative modeling, the Wasserstein distance (WD) has emerged as a useful
metric to measure the discrepancy between generated and real data
distributions. Unfortunately, it is challenging to approximate the WD of
high-dimensional distributions. In contrast, the sliced Wasserstein distance
(SWD) factorizes high-dimensional distributions into their multiple
one-dimensional marginal distributions and is thus easier to approximate. In
this paper, we introduce novel approximations of the primal and dual SWD.
Instead of using a large number of random projections, as it is done by
conventional SWD approximation methods, we propose to approximate SWDs with a
small number of parameterized orthogonal projections in an end-to-end deep
learning fashion. As concrete applications of our SWD approximations, we design
two types of differentiable SWD blocks to equip modern generative
frameworks---Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In
the experiments, we not only show the superiority of the proposed generative
models on standard image synthesis benchmarks, but also demonstrate the
state-of-the-art performance on challenging high resolution image and video
generation in an unsupervised manner.Comment: This paper is accepted by CVPR 2019, accidentally uploaded as a new
submission (arXiv:1904.05408, which has been withdrawn). The code is
available at this https URL https:// github.com/musikisomorphie/swd.gi
Max-Sliced Wasserstein Distance and its use for GANs
Generative adversarial nets (GANs) and variational auto-encoders have
significantly improved our distribution modeling capabilities, showing promise
for dataset augmentation, image-to-image translation and feature learning.
However, to model high-dimensional distributions, sequential training and
stacked architectures are common, increasing the number of tunable
hyper-parameters as well as the training time. Nonetheless, the sample
complexity of the distance metrics remains one of the factors affecting GAN
training. We first show that the recently proposed sliced Wasserstein distance
has compelling sample complexity properties when compared to the Wasserstein
distance. To further improve the sliced Wasserstein distance we then analyze
its `projection complexity' and develop the max-sliced Wasserstein distance
which enjoys compelling sample complexity while reducing projection complexity,
albeit necessitating a max estimation. We finally illustrate that the proposed
distance trains GANs on high-dimensional images up to a resolution of 256x256
easily.Comment: Accepted to CVPR 201
- …