192 research outputs found
Demystifying Fixed k-Nearest Neighbor Information Estimators
Estimating mutual information from i.i.d. samples drawn from an unknown joint
density function is a basic statistical problem of broad interest with
multitudinous applications. The most popular estimator is one proposed by
Kraskov and St\"ogbauer and Grassberger (KSG) in 2004, and is nonparametric and
based on the distances of each sample to its nearest neighboring
sample, where is a fixed small integer. Despite its widespread use (part of
scientific software packages), theoretical properties of this estimator have
been largely unexplored. In this paper we demonstrate that the estimator is
consistent and also identify an upper bound on the rate of convergence of the
bias as a function of number of samples. We argue that the superior performance
benefits of the KSG estimator stems from a curious "correlation boosting"
effect and build on this intuition to modify the KSG estimator in novel ways to
construct a superior estimator. As a byproduct of our investigations, we obtain
nearly tight rates of convergence of the error of the well known fixed
nearest neighbor estimator of differential entropy by Kozachenko and
Leonenko.Comment: 55 pages, 8 figure
- β¦