Search CORE

60,203 research outputs found

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Author: Franceschi Luca
Minervini Pasquale
Niepert Mathias
Publication venue
Publication date: 27/10/2021
Field of study

Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.Comment: NeurIPS 2021 camera-ready; repo: https://github.com/nec-research/tf-iml

arXiv.org e-Print Archive

UCL Discovery

Discrete approximations of continuous probability distributions obtained by minimizing Cramér-von Mises-type distances

Author: Barbiero Alessandro
Hitaj Asmerilda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

We consider the problem of approximating a continuous random variable, characterized by a cumulative distribution function (cdf) F(x), by means of k points, x1< x2< ⋯ < xk, with probabilities pi, i= 1 , ⋯ , k. For a given k, a criterion for determining the xi and pi of the approximating k-point discrete distribution can be the minimization of some distance to the original distribution. Here we consider the weighted Cramér-von Mises distance between the original cdf F(x) and the step-wise cdf F^ (x) of the approximating discrete distribution, characterized by a non-negative weighting function w(x). This problem has been already solved analytically when w(x) corresponds to the probability density function of the continuous random variable, w(x) = F′(x) , and when w(x) is a piece-wise constant function, through a numerical iterative procedure based on a homotopy continuation approach. In this paper, we propose and implement a solution to the problem for different choices of the weighting function w(x), highlighting how the results are affected by w(x) itself and by the number of approximating points k, in addition to F(x); although an analytic solution is not usually available, yet the problem can be numerically solved through an iterative method, which alternately updates the two sub-sets of k unknowns, the xi’s (or a transformation thereof) and the pi’s, till convergence. The main apparent advantage of these discrete approximations is their universality, since they can be applied to most continuous distributions, whether they possess or not the first moments. In order to shed some light on the proposed approaches, applications to several well-known continuous distributions (among them, the normal and the exponential) and to a practical problem where discretization is a useful tool are also illustrated

Archivio istituzionale della ricerca - Università dell'Insubria

Deterministic Sampling for Nonlinear Dynamic State Estimation

Author: Gilitschenski Igor
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2016
Field of study

The goal of this work is improving existing and suggesting novel filtering algorithms for nonlinear dynamic state estimation. Nonlinearity is considered in two ways: First, propagation is improved by proposing novel methods for approximating continuous probability distributions by discrete distributions defined on the same continuous domain. Second, nonlinear underlying domains are considered by proposing novel filters that inherently take the underlying geometry of these domains into account

KITopen

Directory of Open Access Books (DOAB)

Zero biasing and a discrete central limit theorem

Author: Goldstein Larry
Xia Aihua
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 22/11/2006
Field of study

We introduce a new family of distributions to approximate

\mathbb {P}(W\in A)

for

A\subset\{...,-2,-1,0,1,2,...\}

and

W

a sum of independent integer-valued random variables

\xi_1

\xi_2

...,

\xi_n

with finite second moments, where, with large probability,

W

is not concentrated on a lattice of span greater than 1. The well-known Berry--Esseen theorem states that, for

Z

a normal random variable with mean

\mathbb {E}(W)

and variance

\operatorname {Var}(W)

\mathbb {P}(Z\in A)

provides a good approximation to

\mathbb {P}(W\in A)

for

A

of the form

(-\infty,x]

. However, for more general

A

, such as the set of all even numbers, the normal approximation becomes unsatisfactory and it is desirable to have an appropriate discrete, nonnormal distribution which approximates

W

in total variation, and a discrete version of the Berry--Esseen theorem to bound the error. In this paper, using the concept of zero biasing for discrete random variables (cf. Goldstein and Reinert [J. Theoret. Probab. 18 (2005) 237--260]), we introduce a new family of discrete distributions and provide a discrete version of the Berry--Esseen theorem showing how members of the family approximate the distribution of a sum

W

of integer-valued variables in total variation.Comment: Published at http://dx.doi.org/10.1214/009117906000000250 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Asymptotic tail behavior of phase-type scale mixture distributions

Author: Rojas-Nandayapa Leonardo
Xie Wangyue
Publication venue
Publication date: 06/02/2015
Field of study

We consider phase-type scale mixture distributions which correspond to distributions of a product of two independent random variables: a phase-type random variable

Y

and a nonnegative but otherwise arbitrary random variable

S

called the scaling random variable. We investigate conditions for such a class of distributions to be either light- or heavy-tailed, we explore subexponentiality and determine their maximum domains of attraction. Particular focus is given to phase-type scale mixture distributions where the scaling random variable

S

has discrete support --- such a class of distributions has been recently used in risk applications to approximate heavy-tailed distributions. Our results are complemented with several examples.Comment: 18 pages, 0 figur

arXiv.org e-Print Archive

CiteSeerX

University of Liverpool Repository

Discovering a junction tree behind a Markov network by a greedy algorithm

Author: A Altmüller
C Berge
CK Chow
D Karger
E Kovács
E Kovács
Edith Kovács
FM Malvestuto
FM Malvestuto
FR Bach
J Bukszár
J Bukszár
J Pearl
K Nunez
M Yanakakis
N Wermuth
N Wermuth
RE Tarjan
RG Cowell
S Kullback
SL Lauritzen
T Szántai
T Szántai
T Szántai
Tamás Szántai
TM Cover
Y Bishop
Y Xiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/04/2011
Field of study

In an earlier paper we introduced a special kind of k-width junction tree, called k-th order t-cherry junction tree in order to approximate a joint probability distribution. The approximation is the best if the Kullback-Leibler divergence between the true joint probability distribution and the approximating one is minimal. Finding the best approximating k-width junction tree is NP-complete if k>2. In our earlier paper we also proved that the best approximating k-width junction tree can be embedded into a k-th order t-cherry junction tree. We introduce a greedy algorithm resulting very good approximations in reasonable computing time. In this paper we prove that if the Markov network underlying fullfills some requirements then our greedy algorithm is able to find the true probability distribution or its best approximation in the family of the k-th order t-cherry tree probability distributions. Our algorithm uses just the k-th order marginal probability distributions as input. We compare the results of the greedy algorithm proposed in this paper with the greedy algorithm proposed by Malvestuto in 1991.Comment: The paper was presented at VOCAL 2010 in Veszprem, Hungar

arXiv.org e-Print Archive

Crossref