8,724 research outputs found
Generalized Sliced Wasserstein Distances
The Wasserstein distance and its variations, e.g., the sliced-Wasserstein
(SW) distance, have recently drawn attention from the machine learning
community. The SW distance, specifically, was shown to have similar properties
to the Wasserstein distance, while being much simpler to compute, and is
therefore used in various applications including generative modeling and
general supervised/unsupervised learning. In this paper, we first clarify the
mathematical connection between the SW distance and the Radon transform. We
then utilize the generalized Radon transform to define a new family of
distances for probability measures, which we call generalized
sliced-Wasserstein (GSW) distances. We also show that, similar to the SW
distance, the GSW distance can be extended to a maximum GSW (max-GSW) distance.
We then provide the conditions under which GSW and max-GSW distances are indeed
distances. Finally, we compare the numerical performance of the proposed
distances on several generative modeling tasks, including SW flows and SW
auto-encoders
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Gaussian mixture models (GMM) are powerful parametric tools with many
applications in machine learning and computer vision. Expectation maximization
(EM) is the most popular algorithm for estimating the GMM parameters. However,
EM guarantees only convergence to a stationary point of the log-likelihood
function, which could be arbitrarily worse than the optimal solution. Inspired
by the relationship between the negative log-likelihood function and the
Kullback-Leibler (KL) divergence, we propose an alternative formulation for
estimating the GMM parameters using the sliced Wasserstein distance, which
gives rise to a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and the data distribution
with respect to the GMM parameters. In contrast to the KL-divergence, the
energy landscape for the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent scheme to obtain the
optimal GMM parameters. We show that our formulation results in parameter
estimates that are more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more faithfully than the EM
algorithm
Max-Sliced Wasserstein Distance and its use for GANs
Generative adversarial nets (GANs) and variational auto-encoders have
significantly improved our distribution modeling capabilities, showing promise
for dataset augmentation, image-to-image translation and feature learning.
However, to model high-dimensional distributions, sequential training and
stacked architectures are common, increasing the number of tunable
hyper-parameters as well as the training time. Nonetheless, the sample
complexity of the distance metrics remains one of the factors affecting GAN
training. We first show that the recently proposed sliced Wasserstein distance
has compelling sample complexity properties when compared to the Wasserstein
distance. To further improve the sliced Wasserstein distance we then analyze
its `projection complexity' and develop the max-sliced Wasserstein distance
which enjoys compelling sample complexity while reducing projection complexity,
albeit necessitating a max estimation. We finally illustrate that the proposed
distance trains GANs on high-dimensional images up to a resolution of 256x256
easily.Comment: Accepted to CVPR 201
A Smoothed Dual Approach for Variational Wasserstein Problems
Variational problems that involve Wasserstein distances have been recently
proposed to summarize and learn from probability measures. Despite being
conceptually simple, such problems are computationally challenging because they
involve minimizing over quantities (Wasserstein distances) that are themselves
hard to compute. We show that the dual formulation of Wasserstein variational
problems introduced recently by Carlier et al. (2014) can be regularized using
an entropic smoothing, which leads to smooth, differentiable, convex
optimization problems that are simpler to implement and numerically more
stable. We illustrate the versatility of this approach by applying it to the
computation of Wasserstein barycenters and gradient flows of spacial
regularization functionals
Geometric Hydrodynamics via Madelung Transform
We introduce a geometric framework to study Newton's equations on
infinite-dimensional configuration spaces of diffeomorphisms and smooth
probability densities. It turns out that several important PDEs of
hydrodynamical origin can be described in this framework in a natural way. In
particular, the Madelung transform between the Schr\"odinger equation and
Newton's equations is a symplectomorphism of the corresponding phase spaces.
Furthermore, the Madelung transform turns out to be a K\"ahler map when the
space of densities is equipped with the Fisher-Rao information metric. We
describe several dynamical applications of these results.Comment: 17 pages, 2 figure
- …