2,713 research outputs found

    Direct Ensemble Estimation of Density Functionals

    Full text link
    Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses the need for density estimation by using the error rates of k-NN classifiers asdata-driven basis functions that can be combined to estimate a range of density functionals. However, this method is subject to a non-trivial bias that dramatically slows the rate of convergence in higher dimensions. To overcome this limitation, we develop an ensemble method for estimating the value of the basis function which, under some minor constraints on the smoothness of the underlying distributions, achieves the parametric rate of convergence regardless of data dimension.Comment: 5 page

    Information Theoretic Structure Learning with Confidence

    Full text link
    Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure

    Scalable Hash-Based Estimation of Divergence Measures

    Full text link
    We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables XX and YY whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the H\"{o}lder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order d/2d/2, where dd is the dimension of samples, the optimal parametric MSE rate of O(1/N)O(1/N) can be achieved. The computational complexity is shown to be O(N)O(N), which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.Comment: 11 pages, Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spai

    Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

    Full text link
    We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets XX and YY, respectively with NN and MM samples, where Ξ·:=M/N\eta:=M/N is a constant value. Considering the kk-nearest neighbor (kk-NN) graph of YY in the joint data set (X,Y)(X,Y), we show that the average powered ratio of the number of XX points to the number of YY points among all kk-NN points is proportional to R\'{e}nyi divergence of XX and YY densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of Ξ³\gamma-H\"{o}lder smooth functions, the estimator achieves the MSE rate of O(Nβˆ’2Ξ³/(Ξ³+d))O(N^{-2\gamma/(\gamma+d)}). Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order dd, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of O(1/N)O(1/N). Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT

    Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

    Full text link
    This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating Pβˆ—NΟƒP\ast\mathcal{N}_\sigma, for NΟƒβ‰œN(0,Οƒ2Id)\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d), by P^nβˆ—NΟƒ\hat{P}_n\ast\mathcal{N}_\sigma, where P^n\hat{P}_n is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and Ο‡2\chi^2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W1\mathsf{W}_1) converges at rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}} in remarkable contrast to a typical nβˆ’1dn^{-\frac{1}{d}} rate for unsmoothed W1\mathsf{W}_1 (and dβ‰₯3d\ge 3). For the KL divergence, squared 2-Wasserstein distance (W22\mathsf{W}_2^2), and Ο‡2\chi^2-divergence, the convergence rate is eO(d)nβˆ’1e^{O(d)}n^{-1}, but only if PP achieves finite input-output Ο‡2\chi^2 mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to Ο‰(nβˆ’1)\omega(n^{-1}) for the KL divergence and W22\mathsf{W}_2^2, while the Ο‡2\chi^2-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) in the high-dimensional regime. The distribution PP is unknown but nn i.i.d samples from it are available. We first show that any good estimator of h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) must have sample complexity that is exponential in dd. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}}, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158
    • …
    corecore