12 research outputs found
Statistical Guarantees of Generative Adversarial Networks for Distribution Estimation
Generative Adversarial Networks (GANs) have achieved great success in
unsupervised learning. Despite the remarkable empirical performance, there are
limited theoretical understandings on the statistical properties of GANs. This
paper provides statistical guarantees of GANs for the estimation of data
distributions which have densities in a H\"{o}lder space. Our main result shows
that, if the generator and discriminator network architectures are properly
chosen (universally for all distributions with H\"{o}lder densities), GANs are
consistent estimators of the data distributions under strong discrepancy
metrics, such as the Wasserstein distance. To our best knowledge, this is the
first statistical theory of GANs for H\"{o}lder densities. In comparison with
existing works, our theory requires minimum assumptions on data distributions.
Our generator and discriminator networks utilize general weight matrices and
the non-invertible ReLU activation function, while many existing works only
apply to invertible weight matrices and invertible activation functions. In our
analysis, we decompose the error into a statistical error and an approximation
error by a new oracle inequality, which may be of independent interest
Towards Understanding Hierarchical Learning: Benefits of Neural Representations
Deep neural networks can empirically perform efficient hierarchical learning,
in which the layers learn useful representations of the data. However, how they
make use of the intermediate representations are not explained by recent
theories that relate them to "shallow learners" such as kernels. In this work,
we demonstrate that intermediate neural representations add more flexibility to
neural networks and can be advantageous over raw inputs. We consider a fixed,
randomly initialized neural network as a representation function fed into
another trainable network. When the trainable network is the quadratic Taylor
model of a wide two-layer network, we show that neural representation can
achieve improved sample complexities compared with the raw input: For learning
a low-rank degree- polynomial () in dimension, neural
representation requires only samples, while
the best-known sample complexity upper bound for the raw input is
. We contrast our result with a lower bound showing that
neural representations do not improve over the raw input (in the infinite width
limit), when the trainable network is instead a neural tangent kernel. Our
results characterize when neural representations are beneficial, and may
provide a new perspective on why depth is important in deep learning.Comment: 41 pages, published in NeurIPS 202