754 research outputs found

    Kullback-Leibler Proximal Variational Inference

    Get PDF
    We propose a new variational inference method based on a proximal framework that uses the Kullback-Leibler (KL) divergence as the proximal term. We make two contributions towards exploiting the geometry and structure of the variational bound. Firstly, we propose a KL proximal-point algorithm and show its equivalence to variational inference with natural gradients (e.g. stochastic variational inference). Secondly, we use the proximal framework to derive efficient variational algorithms for non-conjugate models. We propose a splitting procedure to separate non-conjugate terms from conjugate ones. We linearize the non-conjugate terms to obtain subproblems that admit a closed-form solution. Overall, our approach converts inference in a non-conjugate model to subproblems that involve inference in well-known conjugate models. We show that our method is applicable to a wide variety of models and can result in computationally efficient algorithms. Applications to real-world datasets show comparable performance to existing methods

    Gradient Flows in Filtering and Fisher-Rao Geometry

    Full text link
    Uncertainty propagation and filtering can be interpreted as gradient flows with respect to suitable metrics in the infinite dimensional manifold of probability density functions. Such a viewpoint has been put forth in recent literature, and a systematic way to formulate and solve the same for linear Gaussian systems has appeared in our previous work where the gradient flows were realized via proximal operators with respect to Wasserstein metric arising in optimal mass transport. In this paper, we derive the evolution equations as proximal operators with respect to Fisher-Rao metric arising in information geometry. We develop the linear Gaussian case in detail and show that a template two step optimization procedure proposed earlier by the authors still applies. Our objective is to provide new geometric interpretations of known equations in filtering, and to clarify the implication of different choices of metric

    On the Minimization of Convex Functionals of Probability Distributions Under Band Constraints

    Full text link
    The problem of minimizing convex functionals of probability distributions is solved under the assumption that the density of every distribution is bounded from above and below. A system of sufficient and necessary first-order optimality conditions as well as a bound on the optimality gap of feasible candidate solutions are derived. Based on these results, two numerical algorithms are proposed that iteratively solve the system of optimality conditions on a grid of discrete points. Both algorithms use a block coordinate descent strategy and terminate once the optimality gap falls below the desired tolerance. While the first algorithm is conceptually simpler and more efficient, it is not guaranteed to converge for objective functions that are not strictly convex. This shortcoming is overcome in the second algorithm, which uses an additional outer proximal iteration, and, which is proven to converge under mild assumptions. Two examples are given to demonstrate the theoretical usefulness of the optimality conditions as well as the high efficiency and accuracy of the proposed numerical algorithms.Comment: 13 pages, 5 figures, 2 tables, published in the IEEE Transactions on Signal Processing. In previous versions, the example in Section VI.B contained some mistakes and inaccuracies, which have been fixed in this versio

    Proximity Operators of Discrete Information Divergences

    Get PDF
    Information divergences allow one to assess how close two distributions are from each other. Among the large panel of available measures, a special attention has been paid to convex φ\varphi-divergences, such as Kullback-Leibler, Jeffreys-Kullback, Hellinger, Chi-Square, Renyi, and Iα_{\alpha} divergences. While φ\varphi-divergences have been extensively studied in convex analysis, their use in optimization problems often remains challenging. In this regard, one of the main shortcomings of existing methods is that the minimization of φ\varphi-divergences is usually performed with respect to one of their arguments, possibly within alternating optimization techniques. In this paper, we overcome this limitation by deriving new closed-form expressions for the proximity operator of such two-variable functions. This makes it possible to employ standard proximal methods for efficiently solving a wide range of convex optimization problems involving φ\varphi-divergences. In addition, we show that these proximity operators are useful to compute the epigraphical projection of several functions of practical interest. The proposed proximal tools are numerically validated in the context of optimal query execution within database management systems, where the problem of selectivity estimation plays a central role. Experiments are carried out on small to large scale scenarios

    Analysis of Langevin Monte Carlo via convex optimization

    Full text link
    In this paper, we provide new insights on the Unadjusted Langevin Algorithm. We show that this method can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order 22. Using this interpretation and techniques borrowed from convex optimization, we give a non-asymptotic analysis of this method to sample from logconcave smooth target distribution on Rd\mathbb{R}^d. Based on this interpretation, we propose two new methods for sampling from a non-smooth target distribution, which we analyze as well. Besides, these new algorithms are natural extensions of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, which is a popular extension of the Unadjusted Langevin Algorithm. Similar to SGLD, they only rely on approximations of the gradient of the target log density and can be used for large-scale Bayesian inference
    • …
    corecore