We study the problem of overcoming exponential sample complexity in
differential entropy estimation under Gaussian convolutions. Specifically, we
consider the estimation of the differential entropy h(X+Z) via n
independently and identically distributed samples of X, where X and Z are
independent D-dimensional random variables with X sub-Gaussian with bounded
second moment and Z∼N(0,σ2ID​). Under the absolute-error
loss, the above problem has a parametric estimation rate of
n​cD​, which is exponential in data dimension D and often
problematic for applications. We overcome this exponential sample complexity by
projecting X to a low-dimensional space via principal component analysis
(PCA) before the entropy estimation, and show that the asymptotic error
overhead vanishes as the unexplained variance of the PCA vanishes. This implies
near-optimal performance for inherently low-dimensional structures embedded in
high-dimensional spaces, including hidden-layer outputs of deep neural networks
(DNN), which can be used to estimate mutual information (MI) in DNNs. We
provide numerical results verifying the performance of our PCA approach on
Gaussian and spiral data. We also apply our method to analysis of information
flow through neural network layers (c.f. information bottleneck), with results
measuring mutual information in a noisy fully connected network and a noisy
convolutional neural network (CNN) for MNIST classification.Comment: To appear in ISIT 202