4 research outputs found
Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth
We prove sharp dimension-free representation results for neural networks with
ReLU layers under square loss for a class of functions
defined in the paper. These results capture the precise benefits of depth in
the following sense:
1. The rates for representing the class of functions via
ReLU layers is sharp up to constants, as shown by matching lower bounds.
2. For each , and as
grows the class of functions contains progressively less
smooth functions.
3. If , then the approximation rate for the class
achieved by depth networks is strictly worse than
that achieved by depth networks.
This constitutes a fine-grained characterization of the representation power
of feedforward networks of arbitrary depth and number of neurons , in
contrast to existing representation results which either require growing
quickly with or assume that the function being represented is highly
smooth. In the latter case similar rates can be obtained with a single
nonlinear layer. Our results confirm the prevailing hypothesis that deeper
networks are better at representing less smooth functions, and indeed, the main
technical novelty is to fully exploit the fact that deep networks can produce
highly oscillatory functions with few activation functions.Comment: 12 pages, 1 figure (surprisingly short isn't it?
High-Order Approximation Rates for Shallow Neural Networks with Cosine and ReLU Activation Functions
We study the approximation properties of shallow neural networks with an
activation function which is a power of the rectified linear unit.
Specifically, we consider the dependence of the approximation rate on the
dimension and the smoothness in the spectral Barron space of the underlying
function to be approximated. We show that as the smoothness index of
increases, shallow neural networks with ReLU activation function obtain
an improved approximation rate up to a best possible rate of
in , independent of the dimension . The
significance of this result is that the activation function ReLU is fixed
independent of the dimension, while for classical methods the degree of
polynomial approximation or the smoothness of the wavelets used would have to
increase in order to take advantage of the dimension dependent smoothness of
. In addition, we derive improved approximation rates for shallow neural
networks with cosine activation function on the spectral Barron space. Finally,
we prove lower bounds showing that the approximation rates attained are optimal
under the given assumptions
Depth separation beyond radial functions
High-dimensional depth separation results for neural networks show that
certain functions can be efficiently approximated by two-hidden-layer networks
but not by one-hidden-layer ones in high-dimensions . Existing results of
this type mainly focus on functions with an underlying radial or
one-dimensional structure, which are usually not encountered in practice. The
first contribution of this paper is to extend such results to a more general
class of functions, namely functions with piece-wise oscillatory structure, by
building on the proof strategy of (Eldan and Shamir, 2016). We complement these
results by showing that, if the domain radius and the rate of oscillation of
the objective function are constant, then approximation by one-hidden-layer
networks holds at a rate for any fixed error threshold.
A common theme in the proof of such results is the fact that one-hidden-layer
networks fail to approximate high-energy functions whose Fourier representation
is spread in the domain. On the other hand, existing approximation results of a
function by one-hidden-layer neural networks rely on the function having a
sparse Fourier representation. The choice of the domain also represents a
source of gaps between upper and lower approximation bounds. Focusing on a
fixed approximation domain, namely the sphere in dimension
, we provide a characterization of both functions which are efficiently
approximable by one-hidden-layer networks and of functions which are provably
not, in terms of their Fourier expansion
A Corrective View of Neural Networks: Representation, Memorization and Learning
We develop a corrective mechanism for neural network approximation: the total
available non-linear units are divided into multiple groups and the first group
approximates the function under consideration, the second group approximates
the error in approximation produced by the first group and corrects it, the
third group approximates the error produced by the first and second groups
together and so on. This technique yields several new representation and
learning results for neural networks. First, we show that two-layer neural
networks in the random features regime (RF) can memorize arbitrary labels for
arbitrary points under under Euclidean distance separation condition using
ReLUs which is optimal in up to logarithmic factors. Next,
we give a powerful representation result for two-layer neural networks with
ReLUs and smoothed ReLUs which can achieve a squared error of at most
with for
when the function is smooth enough (roughly when it has bounded
derivatives). In certain cases can be replaced with effective dimension . Previous results of this type implement Taylor series approximation
using deep architectures. We also consider three-layer neural networks and show
that the corrective mechanism yields faster representation rates for smooth
radial functions. Lastly, we obtain the first
upper bound on the number of neurons required for a two layer network to learn
low degree polynomials up to squared error via gradient descent.
Even though deep networks can express these polynomials with
neurons, the best learning bounds on this
problem require neurons.Comment: Contains 2 figures (you heard that right!), V2 removes dimension
dependence in memorization bound