1,311 research outputs found
A Stein variational Newton method
Stein variational gradient descent (SVGD) was recently proposed as a general
purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]:
it minimizes the Kullback-Leibler divergence between the target distribution
and its approximation by implementing a form of functional gradient descent on
a reproducing kernel Hilbert space. In this paper, we accelerate and generalize
the SVGD algorithm by including second-order information, thereby approximating
a Newton-like iteration in function space. We also show how second-order
information can lead to more effective choices of kernel. We observe
significant computational gains over the original SVGD algorithm in multiple
test cases.Comment: 18 pages, 7 figure
A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
We show that a large class of Estimation of Distribution Algorithms,
including, but not limited to, Covariance Matrix Adaption, can be written as a
Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of
infinite samples. Because EM sits on a rigorous statistical foundation and has
been thoroughly analyzed, this connection provides a new coherent framework
with which to reason about EDAs
Max-Sliced Wasserstein Distance and its use for GANs
Generative adversarial nets (GANs) and variational auto-encoders have
significantly improved our distribution modeling capabilities, showing promise
for dataset augmentation, image-to-image translation and feature learning.
However, to model high-dimensional distributions, sequential training and
stacked architectures are common, increasing the number of tunable
hyper-parameters as well as the training time. Nonetheless, the sample
complexity of the distance metrics remains one of the factors affecting GAN
training. We first show that the recently proposed sliced Wasserstein distance
has compelling sample complexity properties when compared to the Wasserstein
distance. To further improve the sliced Wasserstein distance we then analyze
its `projection complexity' and develop the max-sliced Wasserstein distance
which enjoys compelling sample complexity while reducing projection complexity,
albeit necessitating a max estimation. We finally illustrate that the proposed
distance trains GANs on high-dimensional images up to a resolution of 256x256
easily.Comment: Accepted to CVPR 201
- …