21,041 research outputs found
Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization
Regression aims at estimating the conditional mean of output given input.
However, regression is not informative enough if the conditional density is
multimodal, heteroscedastic, and asymmetric. In such a case, estimating the
conditional density itself is preferable, but conditional density estimation
(CDE) is challenging in high-dimensional space. A naive approach to coping with
high-dimensionality is to first perform dimensionality reduction (DR) and then
execute CDE. However, such a two-step process does not perform well in practice
because the error incurred in the first DR step can be magnified in the second
CDE step. In this paper, we propose a novel single-shot procedure that performs
CDE and DR simultaneously in an integrated way. Our key idea is to formulate DR
as the problem of minimizing a squared-loss variant of conditional entropy, and
this is solved via CDE. Thus, an additional CDE step is not needed after DR. We
demonstrate the usefulness of the proposed method through extensive experiments
on various datasets including humanoid robot transition and computer art
Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation
This paper studies convergence of empirical measures smoothed by a Gaussian
kernel. Specifically, consider approximating , for
, by
, where is the empirical measure,
under different statistical distances. The convergence is examined in terms of
the Wasserstein distance, total variation (TV), Kullback-Leibler (KL)
divergence, and -divergence. We show that the approximation error under
the TV distance and 1-Wasserstein distance () converges at rate
in remarkable contrast to a typical
rate for unsmoothed (and ). For the
KL divergence, squared 2-Wasserstein distance (), and
-divergence, the convergence rate is , but only if
achieves finite input-output mutual information across the additive
white Gaussian noise channel. If the latter condition is not met, the rate
changes to for the KL divergence and , while
the -divergence becomes infinite - a curious dichotomy. As a main
application we consider estimating the differential entropy
in the high-dimensional regime. The distribution
is unknown but i.i.d samples from it are available. We first show that
any good estimator of must have sample complexity
that is exponential in . Using the empirical approximation results we then
show that the absolute-error risk of the plug-in estimator converges at the
parametric rate , thus establishing the minimax
rate-optimality of the plug-in. Numerical results that demonstrate a
significant empirical superiority of the plug-in approach to general-purpose
differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158
- β¦