5,446 research outputs found

    A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

    Full text link
    We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs

    Entropic Wasserstein Gradient Flows

    Full text link
    This article details a novel numerical scheme to approximate gradient flows for optimal transport (i.e. Wasserstein) metrics. These flows have proved useful to tackle theoretically and numerically non-linear diffusion equations that model for instance porous media or crowd evolutions. These gradient flows define a suitable notion of weak solutions for these evolutions and they can be approximated in a stable way using discrete flows. These discrete flows are implicit Euler time stepping according to the Wasserstein metric. A bottleneck of these approaches is the high computational load induced by the resolution of each step. Indeed, this corresponds to the resolution of a convex optimization problem involving a Wasserstein distance to the previous iterate. Following several recent works on the approximation of Wasserstein distances, we consider a discrete flow induced by an entropic regularization of the transportation coupling. This entropic regularization allows one to trade the initial Wasserstein fidelity term for a Kulback-Leibler divergence, which is easier to deal with numerically. We show how KL proximal schemes, and in particular Dykstra's algorithm, can be used to compute each step of the regularized flow. The resulting algorithm is both fast, parallelizable and versatile, because it only requires multiplications by a Gibbs kernel. On Euclidean domains discretized on an uniform grid, this corresponds to a linear filtering (for instance a Gaussian filtering when cc is the squared Euclidean distance) which can be computed in nearly linear time. On more general domains, such as (possibly non-convex) shapes or on manifolds discretized by a triangular mesh, following a recently proposed numerical scheme for optimal transport, this Gibbs kernel multiplication is approximated by a short-time heat diffusion

    Bethe Projections for Non-Local Inference

    Full text link
    Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as posterior regularization. We present a method to discriminatively learn broad families of inference objectives, capturing powerful non-local statistics of the latent variables, while maintaining tractable and provably fast inference using non-Euclidean projected gradient descent with a distance-generating function given by the Bethe entropy. We demonstrate the performance and flexibility of our method by (1) extracting structured citations from research papers by learning soft global constraints, (2) achieving state-of-the-art results on a widely-used handwriting recognition task using a novel learned non-convex inference procedure, and (3) providing a fast and highly scalable algorithm for the challenging problem of inference in a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
    • …
    corecore