226 research outputs found

    Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators

    Full text link
    We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires kk \to \infty as the sample size nn \to \infty) into the functional of interest, the estimators we consider fix k and perform a bias correction. This is more efficient computationally, and, as we show in certain cases, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.Comment: 16 pages, 0 figure

    Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

    Full text link
    We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets XX and YY, respectively with NN and MM samples, where η:=M/N\eta:=M/N is a constant value. Considering the kk-nearest neighbor (kk-NN) graph of YY in the joint data set (X,Y)(X,Y), we show that the average powered ratio of the number of XX points to the number of YY points among all kk-NN points is proportional to R\'{e}nyi divergence of XX and YY densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of γ\gamma-H\"{o}lder smooth functions, the estimator achieves the MSE rate of O(N2γ/(γ+d))O(N^{-2\gamma/(\gamma+d)}). Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order dd, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of O(1/N)O(1/N). Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT

    Scalable Hash-Based Estimation of Divergence Measures

    Full text link
    We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables XX and YY whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the H\"{o}lder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order d/2d/2, where dd is the dimension of samples, the optimal parametric MSE rate of O(1/N)O(1/N) can be achieved. The computational complexity is shown to be O(N)O(N), which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.Comment: 11 pages, Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spai

    Information Theoretic Structure Learning with Confidence

    Full text link
    Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure

    Divergence Measures

    Get PDF
    Data science, information theory, probability theory, statistical learning and other related disciplines greatly benefit from non-negative measures of dissimilarity between pairs of probability measures. These are known as divergence measures, and exploring their mathematical foundations and diverse applications is of significant interest. The present Special Issue, entitled “Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems”, includes eight original contributions, and it is focused on the study of the mathematical properties and applications of classical and generalized divergence measures from an information-theoretic perspective. It mainly deals with two key generalizations of the relative entropy: namely, the R_ényi divergence and the important class of f -divergences. It is our hope that the readers will find interest in this Special Issue, which will stimulate further research in the study of the mathematical foundations and applications of divergence measures

    Ensemble estimation of multivariate f-divergence

    Full text link
    f-divergence estimation is an important problem in the fields of information theory, machine learning, and statistics. While several divergence estimators exist, relatively few of their convergence rates are known. We derive the MSE convergence rate for a density plug-in estimator of f-divergence. Then by applying the theory of optimally weighted ensemble estimation, we derive a divergence estimator with a convergence rate of O(1/T) that is simple to implement and performs well in high dimensions. We validate our theoretical results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body to appendices at the en

    On generalized Cramér-Rao inequalities, generalized Fisher informations and characterizations of generalized q-Gaussian distributions

    Get PDF
    International audienceThis paper deals with Cramér-Rao inequalities in the context of nonextensive statistics and in estimation theory. It gives characterizations of generalized q-Gaussian distributions, and introduces generalized versions of Fisher information. The contributions of this paper are (i) the derivation of new extended Cramér-Rao inequalities for the estimation of a parameter, involving general q-moments of the estimation error, (ii) the derivation of Cramér-Rao inequalities saturated by generalized q-Gaussian distributions, (iii) the definition of generalized Fisher informations, (iv) the identification and interpretation of some prior results, and finally, (v) the suggestion of new estimation methods

    Ensemble Estimation of Information Divergence

    Get PDF
    Recent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required. The theory of optimally weighted ensemble estimation is generalized to derive a divergence estimator that achieves the parametric rate when the densities are sufficiently smooth. Guidelines for the tuning parameter selection and the asymptotic distribution of this estimator are provided. Based on the theory, an empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions. The estimator is shown to be robust to the choice of tuning parameters. We show extensive simulation results that verify the theoretical results of our paper. Finally, we apply the proposed estimator to estimate the bounds on the Bayes error rate of a cell classification problem
    corecore