Search CORE

2,713 research outputs found

Direct Ensemble Estimation of Density Functionals

Author: Berisha Visar
Moon Kevin
Wisler Alan
Publication venue
Publication date: 17/05/2017
Field of study

Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses the need for density estimation by using the error rates of k-NN classifiers asdata-driven basis functions that can be combined to estimate a range of density functionals. However, this method is subject to a non-trivial bias that dramatically slows the rate of convergence in higher dimensions. To overcome this limitation, we develop an ensemble method for estimating the value of the basis function which, under some minor constraints on the smoothness of the underlying distributions, achieves the parametric rate of convergence regardless of data dimension.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Information Theoretic Structure Learning with Confidence

Author: Hero III Alfred O.
Moon Kevin R.
Noshad Morteza
Sekeh Salimeh Yasaei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2016
Field of study

Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Scalable Hash-Based Estimation of Divergence Measures

Author: Hero III Alfred O.
Noshad Morteza
Publication venue
Publication date: 01/01/2018
Field of study

We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables

X

and

Y

whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the H\"{o}lder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order

d/2

, where

d

is the dimension of samples, the optimal parametric MSE rate of

O(1/N)

can be achieved. The computational complexity is shown to be

O(N)

, which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.Comment: 11 pages, Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spai

arXiv.org e-Print Archive

Crossref

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Author: Hero III Alfred O.
Moon Kevin R.
Noshad Morteza
Sekeh Salimeh Yasaei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/11/2017
Field of study

We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets

X

and

Y

, respectively with

N

and

M

samples, where

\eta:=M/N

is a constant value. Considering the

k

-nearest neighbor (

k

-NN) graph of

Y

in the joint data set

(X,Y)

, we show that the average powered ratio of the number of

X

points to the number of

Y

points among all

k

-NN points is proportional to R\'{e}nyi divergence of

X

and

Y

densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of

\gamma

-H\"{o}lder smooth functions, the estimator achieves the MSE rate of

O(N^{-2\gamma/(\gamma+d)})

. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order

d

, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of

O(1/N)

. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT

arXiv.org e-Print Archive

Crossref

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

Author: Goldfeld Ziv
Greenewald Kristjan
Polyanskiy Yury
Weed Jonathan
Publication venue
Publication date: 01/05/2020
Field of study

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating

P\ast\mathcal{N}_\sigma

, for

\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d)

, by

\hat{P}_n\ast\mathcal{N}_\sigma

, where

\hat{P}_n

is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and

\chi^2

-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (

\mathsf{W}_1

) converges at rate

e^{O(d)}n^{-\frac{1}{2}}

in remarkable contrast to a typical

n^{-\frac{1}{d}}

rate for unsmoothed

\mathsf{W}_1

(and

d\ge 3

). For the KL divergence, squared 2-Wasserstein distance (

\mathsf{W}_2^2

), and

\chi^2

-divergence, the convergence rate is

e^{O(d)}n^{-1}

, but only if

P

achieves finite input-output

\chi^2

mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to

\omega(n^{-1})

for the KL divergence and

\mathsf{W}_2^2

, while the

\chi^2

-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy

h(P\ast\mathcal{N}_\sigma)

in the high-dimensional regime. The distribution

P

is unknown but

n

i.i.d samples from it are available. We first show that any good estimator of

h(P\ast\mathcal{N}_\sigma)

must have sample complexity that is exponential in

d

. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate

e^{O(d)}n^{-\frac{1}{2}}

, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158

arXiv.org e-Print Archive

DSpace@MIT