196 research outputs found

    Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators

    Full text link
    We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires k→∞k \to \infty as the sample size n→∞n \to \infty) into the functional of interest, the estimators we consider fix k and perform a bias correction. This is more efficient computationally, and, as we show in certain cases, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.Comment: 16 pages, 0 figure

    Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

    Full text link
    We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets XX and YY, respectively with NN and MM samples, where η:=M/N\eta:=M/N is a constant value. Considering the kk-nearest neighbor (kk-NN) graph of YY in the joint data set (X,Y)(X,Y), we show that the average powered ratio of the number of XX points to the number of YY points among all kk-NN points is proportional to R\'{e}nyi divergence of XX and YY densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of γ\gamma-H\"{o}lder smooth functions, the estimator achieves the MSE rate of O(N−2γ/(γ+d))O(N^{-2\gamma/(\gamma+d)}). Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order dd, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of O(1/N)O(1/N). Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT

    Ensemble estimation of multivariate f-divergence

    Full text link
    f-divergence estimation is an important problem in the fields of information theory, machine learning, and statistics. While several divergence estimators exist, relatively few of their convergence rates are known. We derive the MSE convergence rate for a density plug-in estimator of f-divergence. Then by applying the theory of optimally weighted ensemble estimation, we derive a divergence estimator with a convergence rate of O(1/T) that is simple to implement and performs well in high dimensions. We validate our theoretical results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body to appendices at the en

    Information Theoretic Structure Learning with Confidence

    Full text link
    Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure
    • …
    corecore