754 research outputs found
Kullback-Leibler Proximal Variational Inference
We propose a new variational inference method based on a proximal framework that uses the Kullback-Leibler (KL) divergence as the proximal term. We make two contributions towards exploiting the geometry and structure of the variational bound. Firstly, we propose a KL proximal-point algorithm and show its equivalence to variational inference with natural gradients (e.g. stochastic variational inference). Secondly, we use the proximal framework to derive efficient variational algorithms for non-conjugate models. We propose a splitting procedure to separate non-conjugate terms from conjugate ones. We linearize the non-conjugate terms to obtain subproblems that admit a closed-form solution. Overall, our approach converts inference in a non-conjugate model to subproblems that involve inference in well-known conjugate models. We show that our method is applicable to a wide variety of models and can result in computationally efficient algorithms. Applications to real-world datasets show comparable performance to existing methods
Gradient Flows in Filtering and Fisher-Rao Geometry
Uncertainty propagation and filtering can be interpreted as gradient flows
with respect to suitable metrics in the infinite dimensional manifold of
probability density functions. Such a viewpoint has been put forth in recent
literature, and a systematic way to formulate and solve the same for linear
Gaussian systems has appeared in our previous work where the gradient flows
were realized via proximal operators with respect to Wasserstein metric arising
in optimal mass transport. In this paper, we derive the evolution equations as
proximal operators with respect to Fisher-Rao metric arising in information
geometry. We develop the linear Gaussian case in detail and show that a
template two step optimization procedure proposed earlier by the authors still
applies. Our objective is to provide new geometric interpretations of known
equations in filtering, and to clarify the implication of different choices of
metric
On the Minimization of Convex Functionals of Probability Distributions Under Band Constraints
The problem of minimizing convex functionals of probability distributions is
solved under the assumption that the density of every distribution is bounded
from above and below. A system of sufficient and necessary first-order
optimality conditions as well as a bound on the optimality gap of feasible
candidate solutions are derived. Based on these results, two numerical
algorithms are proposed that iteratively solve the system of optimality
conditions on a grid of discrete points. Both algorithms use a block coordinate
descent strategy and terminate once the optimality gap falls below the desired
tolerance. While the first algorithm is conceptually simpler and more
efficient, it is not guaranteed to converge for objective functions that are
not strictly convex. This shortcoming is overcome in the second algorithm,
which uses an additional outer proximal iteration, and, which is proven to
converge under mild assumptions. Two examples are given to demonstrate the
theoretical usefulness of the optimality conditions as well as the high
efficiency and accuracy of the proposed numerical algorithms.Comment: 13 pages, 5 figures, 2 tables, published in the IEEE Transactions on
Signal Processing. In previous versions, the example in Section VI.B
contained some mistakes and inaccuracies, which have been fixed in this
versio
Proximity Operators of Discrete Information Divergences
Information divergences allow one to assess how close two distributions are
from each other. Among the large panel of available measures, a special
attention has been paid to convex -divergences, such as
Kullback-Leibler, Jeffreys-Kullback, Hellinger, Chi-Square, Renyi, and
I divergences. While -divergences have been extensively
studied in convex analysis, their use in optimization problems often remains
challenging. In this regard, one of the main shortcomings of existing methods
is that the minimization of -divergences is usually performed with
respect to one of their arguments, possibly within alternating optimization
techniques. In this paper, we overcome this limitation by deriving new
closed-form expressions for the proximity operator of such two-variable
functions. This makes it possible to employ standard proximal methods for
efficiently solving a wide range of convex optimization problems involving
-divergences. In addition, we show that these proximity operators are
useful to compute the epigraphical projection of several functions of practical
interest. The proposed proximal tools are numerically validated in the context
of optimal query execution within database management systems, where the
problem of selectivity estimation plays a central role. Experiments are carried
out on small to large scale scenarios
Analysis of Langevin Monte Carlo via convex optimization
In this paper, we provide new insights on the Unadjusted Langevin Algorithm.
We show that this method can be formulated as a first order optimization
algorithm of an objective functional defined on the Wasserstein space of order
. Using this interpretation and techniques borrowed from convex
optimization, we give a non-asymptotic analysis of this method to sample from
logconcave smooth target distribution on . Based on this
interpretation, we propose two new methods for sampling from a non-smooth
target distribution, which we analyze as well. Besides, these new algorithms
are natural extensions of the Stochastic Gradient Langevin Dynamics (SGLD)
algorithm, which is a popular extension of the Unadjusted Langevin Algorithm.
Similar to SGLD, they only rely on approximations of the gradient of the target
log density and can be used for large-scale Bayesian inference
- …