135,889 research outputs found
ON THE JENSEN-SHANNON DIVERGENCE AND THE VARIATION DISTANCE FOR CATEGORICAL PROBABILITY DISTRIBUTIONS
We establish a decomposition of the Jensen-Shannon divergence into a linear combination of a scaled Jeffreys' divergence and a reversed Jensen-Shannon divergence. Upper and lower bounds for the Jensen-Shannon divergence are then found in terms of the squared (total) variation distance. The derivations rely upon the Pinsker inequality and the reverse Pinsker inequality. We use these bounds to prove the asymptotic equivalence of the maximum likelihood estimate and minimum Jensen-Shannon divergence estimate as well as the asymptotic consistency of the minimum Jensen-Shannon divergence estimate. These are key properties for likelihood-free simulator-based inference.Peer reviewe
ON THE JENSEN-SHANNON DIVERGENCE AND THE VARIATION DISTANCE FOR CATEGORICAL PROBABILITY DISTRIBUTIONS
We establish a decomposition of the Jensen-Shannon divergence into a linear combination of a scaled Jeffreys' divergence and a reversed Jensen-Shannon divergence. Upper and lower bounds for the Jensen-Shannon divergence are then found in terms of the squared (total) variation distance. The derivations rely upon the Pinsker inequality and the reverse Pinsker inequality. We use these bounds to prove the asymptotic equivalence of the maximum likelihood estimate and minimum Jensen-Shannon divergence estimate as well as the asymptotic consistency of the minimum Jensen-Shannon divergence estimate. These are key properties for likelihood-free simulator-based inference.Peer reviewe
A Tight Uniform Continuity Bound for Equivocation
We prove a tight uniform continuity bound for the conditional Shannon entropy
of discrete finitely supported random variables in terms of total variation
distance.Comment: 4 pages, streamlined the proof in v2, minor changes + added a
clarifying sentence in v
Properties of Classical and Quantum Jensen-Shannon Divergence
Jensen-Shannon divergence (JD) is a symmetrized and smoothed version of the
most important divergence measure of information theory, Kullback divergence.
As opposed to Kullback divergence it determines in a very direct way a metric;
indeed, it is the square of a metric. We consider a family of divergence
measures (JD_alpha for alpha>0), the Jensen divergences of order alpha, which
generalize JD as JD_1=JD. Using a result of Schoenberg, we prove that JD_alpha
is the square of a metric for alpha lies in the interval (0,2], and that the
resulting metric space of probability distributions can be isometrically
embedded in a real Hilbert space. Quantum Jensen-Shannon divergence (QJD) is a
symmetrized and smoothed version of quantum relative entropy and can be
extended to a family of quantum Jensen divergences of order alpha (QJD_alpha).
We strengthen results by Lamberti et al. by proving that for qubits and pure
states, QJD_alpha^1/2 is a metric space which can be isometrically embedded in
a real Hilbert space when alpha lies in the interval (0,2]. In analogy with
Burbea and Rao's generalization of JD, we also define general QJD by
associating a Jensen-type quantity to any weighted family of states.
Appropriate interpretations of quantities introduced are discussed and bounds
are derived in terms of the total variation and trace distance.Comment: 13 pages, LaTeX, expanded contents, added references and corrected
typo
Tighter Expected Generalization Error Bounds via Convexity of Information Measures
Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds
- …