Learning algorithms for implicit generative models can optimize a variety of
criteria that measure how the data distribution differs from the implicit model
distribution, including the Wasserstein distance, the Energy distance, and the
Maximum Mean Discrepancy criterion. A careful look at the geometries induced by
these distances on the space of probability measures reveals interesting
differences. In particular, we can establish surprising approximate global
convergence guarantees for the 1-Wasserstein distance,even when the
parametric generator has a nonconvex parametrization.Comment: this version fixes a typo in a definitio