60,203 research outputs found

    Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

    Get PDF
    Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.Comment: NeurIPS 2021 camera-ready; repo: https://github.com/nec-research/tf-iml

    Discrete approximations of continuous probability distributions obtained by minimizing Cramér-von Mises-type distances

    Get PDF
    We consider the problem of approximating a continuous random variable, characterized by a cumulative distribution function (cdf) F(x), by means of k points, x1< x2< ⋯ < xk, with probabilities pi, i= 1 , ⋯ , k. For a given k, a criterion for determining the xi and pi of the approximating k-point discrete distribution can be the minimization of some distance to the original distribution. Here we consider the weighted Cramér-von Mises distance between the original cdf F(x) and the step-wise cdf F^ (x) of the approximating discrete distribution, characterized by a non-negative weighting function w(x). This problem has been already solved analytically when w(x) corresponds to the probability density function of the continuous random variable, w(x) = F′(x) , and when w(x) is a piece-wise constant function, through a numerical iterative procedure based on a homotopy continuation approach. In this paper, we propose and implement a solution to the problem for different choices of the weighting function w(x), highlighting how the results are affected by w(x) itself and by the number of approximating points k, in addition to F(x); although an analytic solution is not usually available, yet the problem can be numerically solved through an iterative method, which alternately updates the two sub-sets of k unknowns, the xi’s (or a transformation thereof) and the pi’s, till convergence. The main apparent advantage of these discrete approximations is their universality, since they can be applied to most continuous distributions, whether they possess or not the first moments. In order to shed some light on the proposed approaches, applications to several well-known continuous distributions (among them, the normal and the exponential) and to a practical problem where discretization is a useful tool are also illustrated

    Deterministic Sampling for Nonlinear Dynamic State Estimation

    Get PDF
    The goal of this work is improving existing and suggesting novel filtering algorithms for nonlinear dynamic state estimation. Nonlinearity is considered in two ways: First, propagation is improved by proposing novel methods for approximating continuous probability distributions by discrete distributions defined on the same continuous domain. Second, nonlinear underlying domains are considered by proposing novel filters that inherently take the underlying geometry of these domains into account

    Zero biasing and a discrete central limit theorem

    Full text link
    We introduce a new family of distributions to approximate P(WA)\mathbb {P}(W\in A) for A{...,2,1,0,1,2,...}A\subset\{...,-2,-1,0,1,2,...\} and WW a sum of independent integer-valued random variables ξ1\xi_1, ξ2\xi_2, ...,..., ξn\xi_n with finite second moments, where, with large probability, WW is not concentrated on a lattice of span greater than 1. The well-known Berry--Esseen theorem states that, for ZZ a normal random variable with mean E(W)\mathbb {E}(W) and variance Var(W)\operatorname {Var}(W), P(ZA)\mathbb {P}(Z\in A) provides a good approximation to P(WA)\mathbb {P}(W\in A) for AA of the form (,x](-\infty,x]. However, for more general AA, such as the set of all even numbers, the normal approximation becomes unsatisfactory and it is desirable to have an appropriate discrete, nonnormal distribution which approximates WW in total variation, and a discrete version of the Berry--Esseen theorem to bound the error. In this paper, using the concept of zero biasing for discrete random variables (cf. Goldstein and Reinert [J. Theoret. Probab. 18 (2005) 237--260]), we introduce a new family of discrete distributions and provide a discrete version of the Berry--Esseen theorem showing how members of the family approximate the distribution of a sum WW of integer-valued variables in total variation.Comment: Published at http://dx.doi.org/10.1214/009117906000000250 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Asymptotic tail behavior of phase-type scale mixture distributions

    Get PDF
    We consider phase-type scale mixture distributions which correspond to distributions of a product of two independent random variables: a phase-type random variable YY and a nonnegative but otherwise arbitrary random variable SS called the scaling random variable. We investigate conditions for such a class of distributions to be either light- or heavy-tailed, we explore subexponentiality and determine their maximum domains of attraction. Particular focus is given to phase-type scale mixture distributions where the scaling random variable SS has discrete support --- such a class of distributions has been recently used in risk applications to approximate heavy-tailed distributions. Our results are complemented with several examples.Comment: 18 pages, 0 figur

    Discovering a junction tree behind a Markov network by a greedy algorithm

    Full text link
    In an earlier paper we introduced a special kind of k-width junction tree, called k-th order t-cherry junction tree in order to approximate a joint probability distribution. The approximation is the best if the Kullback-Leibler divergence between the true joint probability distribution and the approximating one is minimal. Finding the best approximating k-width junction tree is NP-complete if k>2. In our earlier paper we also proved that the best approximating k-width junction tree can be embedded into a k-th order t-cherry junction tree. We introduce a greedy algorithm resulting very good approximations in reasonable computing time. In this paper we prove that if the Markov network underlying fullfills some requirements then our greedy algorithm is able to find the true probability distribution or its best approximation in the family of the k-th order t-cherry tree probability distributions. Our algorithm uses just the k-th order marginal probability distributions as input. We compare the results of the greedy algorithm proposed in this paper with the greedy algorithm proposed by Malvestuto in 1991.Comment: The paper was presented at VOCAL 2010 in Veszprem, Hungar
    corecore