Search CORE

12,047 research outputs found

Ensemble estimation of multivariate f-divergence

Author: Hero III Alfred O.
Moon Kevin R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/06/2014
Field of study

f-divergence estimation is an important problem in the fields of information theory, machine learning, and statistics. While several divergence estimators exist, relatively few of their convergence rates are known. We derive the MSE convergence rate for a density plug-in estimator of f-divergence. Then by applying the theory of optimally weighted ensemble estimation, we derive a divergence estimator with a convergence rate of O(1/T) that is simple to implement and performs well in high dimensions. We validate our theoretical results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body to appendices at the en

arXiv.org e-Print Archive

Crossref

Direct Ensemble Estimation of Density Functionals

Author: Berisha Visar
Moon Kevin
Wisler Alan
Publication venue
Publication date: 17/05/2017
Field of study

Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses the need for density estimation by using the error rates of k-NN classifiers asdata-driven basis functions that can be combined to estimate a range of density functionals. However, this method is subject to a non-trivial bias that dramatically slows the rate of convergence in higher dimensions. To overcome this limitation, we develop an ensemble method for estimating the value of the basis function which, under some minor constraints on the smoothness of the underlying distributions, achieves the parametric rate of convergence regardless of data dimension.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Meta learning of bounds on the Bayes classifier error

Author: Delouille Veronique
Hero III Alfred O.
Moon Kevin R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/07/2015
Field of study

Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given feature space, if known, can be used to aid in choosing a classifier, as well as in feature selection and model selection for the base classifiers and the meta classifier. Recent work in the field of f-divergence functional estimation has led to the development of simple and rapidly converging estimators that can be used to estimate various bounds on the Bayes error. We estimate multiple bounds on the Bayes error using an estimator that applies meta learning to slowly converging plug-in estimators to obtain the parametric convergence rate. We compare the estimated bounds empirically on simulated data and then estimate the tighter bounds on features extracted from an image patch analysis of sunspot continuum and magnetogram images.Comment: 6 pages, 3 figures, to appear in proceedings of 2015 IEEE Signal Processing and SP Education Worksho

arXiv.org e-Print Archive

Crossref

Information Theoretic Structure Learning with Confidence

Author: Hero III Alfred O.
Moon Kevin R.
Noshad Morteza
Sekeh Salimeh Yasaei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2016
Field of study

Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Online estimation of discrete densities using classifier chains

Author: Frank Eibe
Geilke Michael
Kramer Stefan
Publication venue: ADReM
Publication date: 01/01/2012
Field of study

We propose an approach to estimate a discrete joint density online, that is, the algorithm is only provided the current example, its current estimate, and a limited amount of memory. To design an online estimator for discrete densities, we use classifier chains to model dependencies among features. Each classifier in the chain estimates the probability of one particular feature. Because a single chain may not provide a reliable estimate, we also consider ensembles of classifier chains. Our experiments on synthetic data show that the approach is feasible and the estimated densities approach the true, known distribution with increasing amounts of data

Research Commons@Waikato

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Author: Hero III Alfred O.
Moon Kevin R.
Noshad Morteza
Sekeh Salimeh Yasaei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/11/2017
Field of study

We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets

X

and

Y

, respectively with

N

and

M

samples, where

\eta:=M/N

is a constant value. Considering the

k

-nearest neighbor (

k

-NN) graph of

Y

in the joint data set

(X,Y)

, we show that the average powered ratio of the number of

X

points to the number of

Y

points among all

k

-NN points is proportional to R\'{e}nyi divergence of

X

and

Y

densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of

\gamma

-H\"{o}lder smooth functions, the estimator achieves the MSE rate of

O(N^{-2\gamma/(\gamma+d)})

. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order

d

, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of

O(1/N)

. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT

arXiv.org e-Print Archive

Crossref