49,536 research outputs found
Normal approximation of Random Gaussian Neural Networks
In this paper we provide explicit upper bounds on some distances between the
(law of the) output of a random Gaussian NN and (the law of) a random Gaussian
vector. Our results concern both shallow random Gaussian neural networks with
univariate output and fully connected and deep random Gaussian neural networks,
with a rather general activation function. The upper bounds show how the widths
of the layers, the activation functions and other architecture parameters
affect the Gaussian approximation of the ouput. Our techniques, relying on
Stein's method and integration by parts formulas for the Gaussian law, yield
estimates on distances which are indeed integral probability metrics, and
include the total variation and the convex distances. These latter metrics are
defined by testing against indicator functions of suitable measurable sets, and
so allow for accurate estimates of the probability that the output is localized
in some region of the space. Such estimates have a significant interest both
from a practitioner's and a theorist's perspective
Deep Sufficient Representation Learning via Mutual Information
We propose a mutual information-based sufficient representation learning
(MSRL) approach, which uses the variational formulation of the mutual
information and leverages the approximation power of deep neural networks. MSRL
learns a sufficient representation with the maximum mutual information with the
response and a user-selected distribution. It can easily handle
multi-dimensional continuous or categorical response variables. MSRL is shown
to be consistent in the sense that the conditional probability density function
of the response variable given the learned representation converges to the
conditional probability density function of the response variable given the
predictor. Non-asymptotic error bounds for MSRL are also established under
suitable conditions. To establish the error bounds, we derive a generalized
Dudley's inequality for an order-two U-process indexed by deep neural networks,
which may be of independent interest. We discuss how to determine the intrinsic
dimension of the underlying data distribution. Moreover, we evaluate the
performance of MSRL via extensive numerical experiments and real data analysis
and demonstrate that MSRL outperforms some existing nonlinear sufficient
dimension reduction methods.Comment: 43 pages, 6 figures and 5 table
Empirical Bounds on Linear Regions of Deep Rectifier Networks
We can compare the expressiveness of neural networks that use rectified
linear units (ReLUs) by the number of linear regions, which reflect the number
of pieces of the piecewise linear functions modeled by such networks. However,
enumerating these regions is prohibitive and the known analytical bounds are
identical for networks with same dimensions. In this work, we approximate the
number of linear regions through empirical bounds based on features of the
trained network and probabilistic inference. Our first contribution is a method
to sample the activation patterns defined by ReLUs using universal hash
functions. This method is based on a Mixed-Integer Linear Programming (MILP)
formulation of the network and an algorithm for probabilistic lower bounds of
MILP solution sets that we call MIPBound, which is considerably faster than
exact counting and reaches values in similar orders of magnitude. Our second
contribution is a tighter activation-based bound for the maximum number of
linear regions, which is particularly stronger in networks with narrow layers.
Combined, these bounds yield a fast proxy for the number of linear regions of a
deep neural network.Comment: AAAI 202
Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review
The paper characterizes classes of functions for which deep learning can be
exponentially better than shallow learning. Deep convolutional networks are a
special case of these conditions, though weight sharing is not the main reason
for their exponential advantage
- …