5,446 research outputs found
A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
We show that a large class of Estimation of Distribution Algorithms,
including, but not limited to, Covariance Matrix Adaption, can be written as a
Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of
infinite samples. Because EM sits on a rigorous statistical foundation and has
been thoroughly analyzed, this connection provides a new coherent framework
with which to reason about EDAs
Entropic Wasserstein Gradient Flows
This article details a novel numerical scheme to approximate gradient flows
for optimal transport (i.e. Wasserstein) metrics. These flows have proved
useful to tackle theoretically and numerically non-linear diffusion equations
that model for instance porous media or crowd evolutions. These gradient flows
define a suitable notion of weak solutions for these evolutions and they can be
approximated in a stable way using discrete flows. These discrete flows are
implicit Euler time stepping according to the Wasserstein metric. A bottleneck
of these approaches is the high computational load induced by the resolution of
each step. Indeed, this corresponds to the resolution of a convex optimization
problem involving a Wasserstein distance to the previous iterate. Following
several recent works on the approximation of Wasserstein distances, we consider
a discrete flow induced by an entropic regularization of the transportation
coupling. This entropic regularization allows one to trade the initial
Wasserstein fidelity term for a Kulback-Leibler divergence, which is easier to
deal with numerically. We show how KL proximal schemes, and in particular
Dykstra's algorithm, can be used to compute each step of the regularized flow.
The resulting algorithm is both fast, parallelizable and versatile, because it
only requires multiplications by a Gibbs kernel. On Euclidean domains
discretized on an uniform grid, this corresponds to a linear filtering (for
instance a Gaussian filtering when is the squared Euclidean distance) which
can be computed in nearly linear time. On more general domains, such as
(possibly non-convex) shapes or on manifolds discretized by a triangular mesh,
following a recently proposed numerical scheme for optimal transport, this
Gibbs kernel multiplication is approximated by a short-time heat diffusion
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
- …