60,203 research outputs found
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Combining discrete probability distributions and combinatorial optimization
problems with neural network components has numerous applications but poses
several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE),
a framework for end-to-end learning of models combining discrete exponential
family distributions and differentiable neural components. I-MLE is widely
applicable as it only requires the ability to compute the most probable states
and does not rely on smooth relaxations. The framework encompasses several
approaches such as perturbation-based implicit differentiation and recent
methods to differentiate through black-box combinatorial solvers. We introduce
a novel class of noise distributions for approximating marginals via
perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood
estimation when used in some recently studied learning settings that involve
combinatorial solvers. Experiments on several datasets suggest that I-MLE is
competitive with and often outperforms existing approaches which rely on
problem-specific relaxations.Comment: NeurIPS 2021 camera-ready; repo:
https://github.com/nec-research/tf-iml
Discrete approximations of continuous probability distributions obtained by minimizing Cramér-von Mises-type distances
We consider the problem of approximating a continuous random variable, characterized by a cumulative distribution function (cdf) F(x), by means of k points, x1< x2< ⋯ < xk, with probabilities pi, i= 1 , ⋯ , k. For a given k, a criterion for determining the xi and pi of the approximating k-point discrete distribution can be the minimization of some distance to the original distribution. Here we consider the weighted Cramér-von Mises distance between the original cdf F(x) and the step-wise cdf F^ (x) of the approximating discrete distribution, characterized by a non-negative weighting function w(x). This problem has been already solved analytically when w(x) corresponds to the probability density function of the continuous random variable, w(x) = F′(x) , and when w(x) is a piece-wise constant function, through a numerical iterative procedure based on a homotopy continuation approach. In this paper, we propose and implement a solution to the problem for different choices of the weighting function w(x), highlighting how the results are affected by w(x) itself and by the number of approximating points k, in addition to F(x); although an analytic solution is not usually available, yet the problem can be numerically solved through an iterative method, which alternately updates the two sub-sets of k unknowns, the xi’s (or a transformation thereof) and the pi’s, till convergence. The main apparent advantage of these discrete approximations is their universality, since they can be applied to most continuous distributions, whether they possess or not the first moments. In order to shed some light on the proposed approaches, applications to several well-known continuous distributions (among them, the normal and the exponential) and to a practical problem where discretization is a useful tool are also illustrated
Deterministic Sampling for Nonlinear Dynamic State Estimation
The goal of this work is improving existing and suggesting novel filtering algorithms for nonlinear dynamic state estimation. Nonlinearity is considered in two ways: First, propagation is improved by proposing novel methods for approximating continuous probability distributions by discrete distributions defined on the same continuous domain. Second, nonlinear underlying domains are considered by proposing novel filters that inherently take the underlying geometry of these domains into account
Zero biasing and a discrete central limit theorem
We introduce a new family of distributions to approximate for and a sum of independent
integer-valued random variables , , with finite
second moments, where, with large probability, is not concentrated on a
lattice of span greater than 1. The well-known Berry--Esseen theorem states
that, for a normal random variable with mean and variance
, provides a good approximation
to for of the form . However, for more
general , such as the set of all even numbers, the normal approximation
becomes unsatisfactory and it is desirable to have an appropriate discrete,
nonnormal distribution which approximates in total variation, and a
discrete version of the Berry--Esseen theorem to bound the error. In this
paper, using the concept of zero biasing for discrete random variables (cf.
Goldstein and Reinert [J. Theoret. Probab. 18 (2005) 237--260]), we introduce a
new family of discrete distributions and provide a discrete version of the
Berry--Esseen theorem showing how members of the family approximate the
distribution of a sum of integer-valued variables in total variation.Comment: Published at http://dx.doi.org/10.1214/009117906000000250 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Asymptotic tail behavior of phase-type scale mixture distributions
We consider phase-type scale mixture distributions which correspond to
distributions of a product of two independent random variables: a phase-type
random variable and a nonnegative but otherwise arbitrary random variable
called the scaling random variable. We investigate conditions for such a
class of distributions to be either light- or heavy-tailed, we explore
subexponentiality and determine their maximum domains of attraction. Particular
focus is given to phase-type scale mixture distributions where the scaling
random variable has discrete support --- such a class of distributions has
been recently used in risk applications to approximate heavy-tailed
distributions. Our results are complemented with several examples.Comment: 18 pages, 0 figur
Discovering a junction tree behind a Markov network by a greedy algorithm
In an earlier paper we introduced a special kind of k-width junction tree,
called k-th order t-cherry junction tree in order to approximate a joint
probability distribution. The approximation is the best if the Kullback-Leibler
divergence between the true joint probability distribution and the
approximating one is minimal. Finding the best approximating k-width junction
tree is NP-complete if k>2. In our earlier paper we also proved that the best
approximating k-width junction tree can be embedded into a k-th order t-cherry
junction tree. We introduce a greedy algorithm resulting very good
approximations in reasonable computing time.
In this paper we prove that if the Markov network underlying fullfills some
requirements then our greedy algorithm is able to find the true probability
distribution or its best approximation in the family of the k-th order t-cherry
tree probability distributions. Our algorithm uses just the k-th order marginal
probability distributions as input.
We compare the results of the greedy algorithm proposed in this paper with
the greedy algorithm proposed by Malvestuto in 1991.Comment: The paper was presented at VOCAL 2010 in Veszprem, Hungar
- …