Search CORE

196 research outputs found

On accuracy of PDF divergence estimators and their applicability to representative data sampling

Author: Bogdan Gabrys
Budka
Cardoso
Cardoso
Cichocki
Dhillon
Duda
Fukunaga
Jenssen
Jenssen
Kapur
Katarzyna Musial
Kullback
Kullback
Kuncheva
Le Cam
MacKay
Marcin Budka
Moreno
Ojala
Parzen
Principe
Ripley
Sheather
Silverman
Stone
Turlach
Publication venue: 'MDPI AG'
Publication date: 01/01/2011
Field of study

Generalisation error estimation is an important issue in machine learning. Cross-validation traditionally used for this purpose requires building multiple models and repeating the whole procedure many times in order to produce reliable error estimates. It is however possible to accurately estimate the error using only a single model, if the training and test data are chosen appropriately. This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

Bournemouth University Research Online

King's Research Portal

A note on Onicescu's informational energy and correlation coefficient in exponential families

Author: Nielsen Frank
Publication venue
Publication date: 01/04/2020
Field of study

The informational energy of Onicescu is a positive quantity that measures the amount of uncertainty of a random variable like Shannon's entropy. In this note, we report closed-form formula for Onicescu's informational energy and correlation coefficient when the densities belong to an exponential family. We also report as a byproduct a closed-form formula for the Cauchy-Schwarz divergence between densities of an exponential family.Comment: 13 page

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

Author: Goldfeld Ziv
Greenewald Kristjan
Polyanskiy Yury
Weed Jonathan
Publication venue
Publication date: 01/05/2020
Field of study

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating

P\ast\mathcal{N}_\sigma

, for

\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d)

, by

\hat{P}_n\ast\mathcal{N}_\sigma

, where

\hat{P}_n

is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and

\chi^2

-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (

\mathsf{W}_1

) converges at rate

e^{O(d)}n^{-\frac{1}{2}}

in remarkable contrast to a typical

n^{-\frac{1}{d}}

rate for unsmoothed

\mathsf{W}_1

(and

d\ge 3

). For the KL divergence, squared 2-Wasserstein distance (

\mathsf{W}_2^2

), and

\chi^2

-divergence, the convergence rate is

e^{O(d)}n^{-1}

, but only if

P

achieves finite input-output

\chi^2

mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to

\omega(n^{-1})

for the KL divergence and

\mathsf{W}_2^2

, while the

\chi^2

-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy

h(P\ast\mathcal{N}_\sigma)

in the high-dimensional regime. The distribution

P

is unknown but

n

i.i.d samples from it are available. We first show that any good estimator of

h(P\ast\mathcal{N}_\sigma)

must have sample complexity that is exponential in

d

. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate

e^{O(d)}n^{-\frac{1}{2}}

, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158

arXiv.org e-Print Archive

DSpace@MIT

Multi-modal filtering for non-linear estimation

Author: Deisenroth MP
Kamthe S
Peters J
Publication venue
Publication date: 01/01/2014
Field of study

Multi-modal densities appear frequently in time series and practical applications. However, they are not well represented by common state estimators, such as the Extended Kalman Filter and the Unscented Kalman Filter, which additionally suffer from the fact that uncertainty is often not captured sufficiently well. This can result in incoherent and divergent tracking performance. In this paper, we address these issues by devising a non-linear filtering algorithm where densities are represented by Gaussian mixture models, whose parameters are estimated in closed form. The resulting method exhibits a superior performance on nonlinear benchmarks. © 2014 IEEE

TUbiblio

Crossref

UCL Discovery

Spiral - Imperial College Digital Repository

MPG.PuRe

$k$ -MLE: A fast algorithm for learning statistical mixture models

Author: Nielsen Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/03/2012
Field of study

We describe

k

-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering

k

-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of

k

-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of

k

-MLE can be implemented using any

k

-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of

k

-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize

k

-MLE, we propose

k

-MLE++, a careful initialization of

k

-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201

arXiv.org e-Print Archive

Crossref

Multi-modal filtering for non-linear estimation

Author: Deisenroth MP
Kamthe S
Peters J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/07/2014
Field of study

UCL Discovery

Estimation and control of multi-object systems with high-fidenlity sensor models: A labelled random finite set approach

Author: Beard Michael Anthony
Publication venue: Curtin University
Publication date: 01/01/2016
Field of study

Principled and novel multi-object tracking algorithms are proposed, that have the ability to optimally process realistic sensor data, by accommodating complex observational phenomena such as merged measurements and extended targets. Additionally, a sensor control scheme based on a tractable, information theoretic objective is proposed, the goal of which is to optimise tracking performance in multi-object scenarios. The concept of labelled random finite sets is adopted in the development of these new techniques

espace@Curtin