Search CORE

21,041 research outputs found

Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization

Author: Sugiyama Masashi
Tangkaratt Voot
Xie Ning
Publication venue
Publication date: 28/04/2014
Field of study

Regression aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional density is multimodal, heteroscedastic, and asymmetric. In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space. A naive approach to coping with high-dimensionality is to first perform dimensionality reduction (DR) and then execute CDE. However, such a two-step process does not perform well in practice because the error incurred in the first DR step can be magnified in the second CDE step. In this paper, we propose a novel single-shot procedure that performs CDE and DR simultaneously in an integrated way. Our key idea is to formulate DR as the problem of minimizing a squared-loss variant of conditional entropy, and this is solved via CDE. Thus, an additional CDE step is not needed after DR. We demonstrate the usefulness of the proposed method through extensive experiments on various datasets including humanoid robot transition and computer art

arXiv.org e-Print Archive

CiteSeerX

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

Author: Goldfeld Ziv
Greenewald Kristjan
Polyanskiy Yury
Weed Jonathan
Publication venue
Publication date: 01/05/2020
Field of study

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating

P\ast\mathcal{N}_\sigma

, for

\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d)

, by

\hat{P}_n\ast\mathcal{N}_\sigma

, where

\hat{P}_n

is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and

\chi^2

-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (

\mathsf{W}_1

) converges at rate

e^{O(d)}n^{-\frac{1}{2}}

in remarkable contrast to a typical

n^{-\frac{1}{d}}

rate for unsmoothed

\mathsf{W}_1

(and

d\ge 3

). For the KL divergence, squared 2-Wasserstein distance (

\mathsf{W}_2^2

), and

\chi^2

-divergence, the convergence rate is

e^{O(d)}n^{-1}

, but only if

P

achieves finite input-output

\chi^2

mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to

\omega(n^{-1})

for the KL divergence and

\mathsf{W}_2^2

, while the

\chi^2

-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy

h(P\ast\mathcal{N}_\sigma)

in the high-dimensional regime. The distribution

P

is unknown but

n

i.i.d samples from it are available. We first show that any good estimator of

h(P\ast\mathcal{N}_\sigma)

must have sample complexity that is exponential in

d

. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate

e^{O(d)}n^{-\frac{1}{2}}

, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158

arXiv.org e-Print Archive

DSpace@MIT