Search CORE

64 research outputs found

Direct Ensemble Estimation of Density Functionals

Author: Berisha Visar
Moon Kevin
Wisler Alan
Publication venue
Publication date: 17/05/2017
Field of study

Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses the need for density estimation by using the error rates of k-NN classifiers asdata-driven basis functions that can be combined to estimate a range of density functionals. However, this method is subject to a non-trivial bias that dramatically slows the rate of convergence in higher dimensions. To overcome this limitation, we develop an ensemble method for estimating the value of the basis function which, under some minor constraints on the smoothness of the underlying distributions, achieves the parametric rate of convergence regardless of data dimension.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Justification of Logarithmic Loss via the Benefit of Side Information

Author: Courtade Thomas
Jiao Jiantao
Venkat Kartik
Weissman Tsachy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/12/2015
Field of study

We consider a natural measure of relevance: the reduction in optimal prediction risk in the presence of side information. For any given loss function, this relevance measure captures the benefit of side information for performing inference on a random variable under this loss function. When such a measure satisfies a natural data processing property, and the random variable of interest has alphabet size greater than two, we show that it is uniquely characterized by the mutual information, and the corresponding loss function coincides with logarithmic loss. In doing so, our work provides a new characterization of mutual information, and justifies its use as a measure of relevance. When the alphabet is binary, we characterize the only admissible forms the measure of relevance can assume while obeying the specified data processing property. Our results naturally extend to measuring causal influence between stochastic processes, where we unify different causal-inference measures in the literature as instantiations of directed information

arXiv.org e-Print Archive

Crossref

Sharp Bounds for Generalized Uniformity Testing

Author: Diakonikolas Ilias
Kane Daniel M.
Stewart Alistair
Publication venue
Publication date: 07/09/2017
Field of study

We study the problem of generalized uniformity testing \cite{BC17} of a discrete probability distribution: Given samples from a probability distribution

p

over an {\em unknown} discrete domain

\mathbf{\Omega}

, we want to distinguish, with probability at least

2/3

, between the case that

p

is uniform on some {\em subset} of

\mathbf{\Omega}

versus

\epsilon

-far, in total variation distance, from any such uniform distribution. We establish tight bounds on the sample complexity of generalized uniformity testing. In more detail, we present a computationally efficient tester whose sample complexity is optimal, up to constant factors, and a matching information-theoretic lower bound. Specifically, we show that the sample complexity of generalized uniformity testing is

\Theta\left(1/(\epsilon^{4/3}\|p\|_3) + 1/(\epsilon^{2} \|p\|_2) \right)

arXiv.org e-Print Archive

eScholarship - University of California