We propose a new density estimation algorithm. Given n i.i.d. samples from
a distribution belonging to a class of densities on Rd, our
estimator outputs any density in the class whose ''perceptron discrepancy''
with the empirical distribution is at most O(d/n). The perceptron
discrepancy between two distributions is defined as the largest difference in
mass that they place on any halfspace of Rd. It is shown that this
estimator achieves expected total variation distance to the truth that is
almost minimax optimal over the class of densities with bounded Sobolev norm
and Gaussian mixtures. This suggests that regularity of the prior distribution
could be an explanation for the efficiency of the ubiquitous step in machine
learning that replaces optimization over large function spaces with simpler
parametric classes (e.g. in the discriminators of GANs).
We generalize the above to show that replacing the ''perceptron discrepancy''
with the generalized energy distance of Sz\'ekeley-Rizzo further improves total
variation loss. The generalized energy distance between empirical distributions
is easily computable and differentiable, thus making it especially useful for
fitting generative models. To the best of our knowledge, it is the first
example of a distance with such properties for which there are minimax
statistical guarantees.Comment: 47 page