236 research outputs found
Direct Estimation of Information Divergence Using Nearest Neighbor Ratios
We propose a direct estimation method for R\'{e}nyi and f-divergence measures
based on a new graph theoretical interpretation. Suppose that we are given two
sample sets and , respectively with and samples, where
is a constant value. Considering the -nearest neighbor (-NN)
graph of in the joint data set , we show that the average powered
ratio of the number of points to the number of points among all -NN
points is proportional to R\'{e}nyi divergence of and densities. A
similar method can also be used to estimate f-divergence measures. We derive
bias and variance rates, and show that for the class of -H\"{o}lder
smooth functions, the estimator achieves the MSE rate of
. Furthermore, by using a weighted ensemble
estimation technique, for density functions with continuous and bounded
derivatives of up to the order , and some extra conditions at the support
set boundary, we derive an ensemble estimator that achieves the parametric MSE
rate of . Our estimators are more computationally tractable than other
competing estimators, which makes them appealing in many practical
applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT
Use of the geometric mean as a statistic for the scale of the coupled Gaussian distributions
The geometric mean is shown to be an appropriate statistic for the scale of a
heavy-tailed coupled Gaussian distribution or equivalently the Student's t
distribution. The coupled Gaussian is a member of a family of distributions
parameterized by the nonlinear statistical coupling which is the reciprocal of
the degree of freedom and is proportional to fluctuations in the inverse scale
of the Gaussian. Existing estimators of the scale of the coupled Gaussian have
relied on estimates of the full distribution, and they suffer from problems
related to outliers in heavy-tailed distributions. In this paper, the scale of
a coupled Gaussian is proven to be equal to the product of the generalized mean
and the square root of the coupling. From our numerical computations of the
scales of coupled Gaussians using the generalized mean of random samples, it is
indicated that only samples from a Cauchy distribution (with coupling parameter
one) form an unbiased estimate with diminishing variance for large samples.
Nevertheless, we also prove that the scale is a function of the geometric mean,
the coupling term and a harmonic number. Numerical experiments show that this
estimator is unbiased with diminishing variance for large samples for a broad
range of coupling values.Comment: 17 pages, 5 figure
The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning
Suppose we are given access to independent samples from distribution
and we wish to output one of them with the goal of making the output
distributed as close as possible to a target distribution . In this work
we show that the optimal total variation distance as a function of is given
by over the class of all pairs with a
bounded -divergence . Previously, this question was
studied only for the case when the Radon-Nikodym derivative of with
respect to is uniformly bounded. We then consider an application in the
seemingly very different field of smoothed online learning, where we show that
recent results on the minimax regret and the regret of oracle-efficient
algorithms still hold even under relaxed constraints on the adversary (to have
bounded -divergence, as opposed to bounded Radon-Nikodym derivative).
Finally, we also study efficacy of importance sampling for mean estimates
uniform over a function class and compare importance sampling with rejection
sampling
Scalable Hash-Based Estimation of Divergence Measures
We propose a scalable divergence estimation method based on hashing. Consider
two continuous random variables and whose densities have bounded
support. We consider a particular locality sensitive random hashing, and
consider the ratio of samples in each hash bin having non-zero numbers of Y
samples. We prove that the weighted average of these ratios over all of the
hash bins converges to f-divergences between the two samples sets. We show that
the proposed estimator is optimal in terms of both MSE rate and computational
complexity. We derive the MSE rates for two families of smooth functions; the
H\"{o}lder smoothness class and differentiable functions. In particular, it is
proved that if the density functions have bounded derivatives up to the order
, where is the dimension of samples, the optimal parametric MSE rate
of can be achieved. The computational complexity is shown to be
, which is optimal. To the best of our knowledge, this is the first
empirical divergence estimator that has optimal computational complexity and
achieves the optimal parametric MSE estimation rate.Comment: 11 pages, Proceedings of the 21st International Conference on
Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spai
Ensemble estimation of multivariate f-divergence
f-divergence estimation is an important problem in the fields of information
theory, machine learning, and statistics. While several divergence estimators
exist, relatively few of their convergence rates are known. We derive the MSE
convergence rate for a density plug-in estimator of f-divergence. Then by
applying the theory of optimally weighted ensemble estimation, we derive a
divergence estimator with a convergence rate of O(1/T) that is simple to
implement and performs well in high dimensions. We validate our theoretical
results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted
to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body
to appendices at the en
- …