Search CORE

236 research outputs found

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Author: Hero III Alfred O.
Moon Kevin R.
Noshad Morteza
Sekeh Salimeh Yasaei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/11/2017
Field of study

We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets

X

and

Y

, respectively with

N

and

M

samples, where

\eta:=M/N

is a constant value. Considering the

k

-nearest neighbor (

k

-NN) graph of

Y

in the joint data set

(X,Y)

, we show that the average powered ratio of the number of

X

points to the number of

Y

points among all

k

-NN points is proportional to R\'{e}nyi divergence of

X

and

Y

densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of

\gamma

-H\"{o}lder smooth functions, the estimator achieves the MSE rate of

O(N^{-2\gamma/(\gamma+d)})

. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order

d

, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of

O(1/N)

. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT

arXiv.org e-Print Archive

Crossref

Use of the geometric mean as a statistic for the scale of the coupled Gaussian distributions

Author: Kon Mark A.
Nelson Kenric P.
Umarov Sabir R.
Publication venue: 'Elsevier BV'
Publication date: 13/08/2018
Field of study

The geometric mean is shown to be an appropriate statistic for the scale of a heavy-tailed coupled Gaussian distribution or equivalently the Student's t distribution. The coupled Gaussian is a member of a family of distributions parameterized by the nonlinear statistical coupling which is the reciprocal of the degree of freedom and is proportional to fluctuations in the inverse scale of the Gaussian. Existing estimators of the scale of the coupled Gaussian have relied on estimates of the full distribution, and they suffer from problems related to outliers in heavy-tailed distributions. In this paper, the scale of a coupled Gaussian is proven to be equal to the product of the generalized mean and the square root of the coupling. From our numerical computations of the scales of coupled Gaussians using the generalized mean of random samples, it is indicated that only samples from a Cauchy distribution (with coupling parameter one) form an unbiased estimate with diminishing variance for large samples. Nevertheless, we also prove that the scale is a function of the geometric mean, the coupling term and a harmonic number. Numerical experiments show that this estimator is unbiased with diminishing variance for large samples for a broad range of coupling values.Comment: 17 pages, 5 figure

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning

Author: Block Adam
Polyanskiy Yury
Publication venue
Publication date: 27/06/2023
Field of study

Suppose we are given access to

n

independent samples from distribution

\mu

and we wish to output one of them with the goal of making the output distributed as close as possible to a target distribution

\nu

. In this work we show that the optimal total variation distance as a function of

n

is given by

\tilde\Theta(\frac{D}{f'(n)})

over the class of all pairs

\nu,\mu

with a bounded

f

-divergence

D_f(\nu\|\mu)\leq D

. Previously, this question was studied only for the case when the Radon-Nikodym derivative of

\nu

with respect to

\mu

is uniformly bounded. We then consider an application in the seemingly very different field of smoothed online learning, where we show that recent results on the minimax regret and the regret of oracle-efficient algorithms still hold even under relaxed constraints on the adversary (to have bounded

f

-divergence, as opposed to bounded Radon-Nikodym derivative). Finally, we also study efficacy of importance sampling for mean estimates uniform over a function class and compare importance sampling with rejection sampling

arXiv.org e-Print Archive

Scalable Hash-Based Estimation of Divergence Measures

Author: Hero III Alfred O.
Noshad Morteza
Publication venue
Publication date: 01/01/2018
Field of study

We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables

X

and

Y

whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the H\"{o}lder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order

d/2

, where

d

is the dimension of samples, the optimal parametric MSE rate of

O(1/N)

can be achieved. The computational complexity is shown to be

O(N)

, which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.Comment: 11 pages, Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spai

arXiv.org e-Print Archive

Crossref

Ensemble estimation of multivariate f-divergence

Author: Hero III Alfred O.
Moon Kevin R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/06/2014
Field of study

f-divergence estimation is an important problem in the fields of information theory, machine learning, and statistics. While several divergence estimators exist, relatively few of their convergence rates are known. We derive the MSE convergence rate for a density plug-in estimator of f-divergence. Then by applying the theory of optimally weighted ensemble estimation, we derive a divergence estimator with a convergence rate of O(1/T) that is simple to implement and performs well in high dimensions. We validate our theoretical results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body to appendices at the en

arXiv.org e-Print Archive

Crossref