25,550 research outputs found
Sub-Weibull distributions: generalizing sub-Gaussian and sub-Exponential properties to heavier-tailed distributions
We propose the notion of sub-Weibull distributions, which are characterised
by tails lighter than (or equally light as) the right tail of a Weibull
distribution. This novel class generalises the sub-Gaussian and sub-Exponential
families to potentially heavier-tailed distributions. Sub-Weibull distributions
are parameterized by a positive tail index and reduce to sub-Gaussian
distributions for and to sub-Exponential distributions for
. A characterisation of the sub-Weibull property based on moments and
on the moment generating function is provided and properties of the class are
studied. An estimation procedure for the tail parameter is proposed and is
applied to an example stemming from Bayesian deep learning.Comment: 10 pages, 3 figure
Random deep neural networks are biased towards simple functions
We prove that the binary classifiers of bit strings generated by random wide
deep neural networks with ReLU activation function are biased towards simple
functions. The simplicity is captured by the following two properties. For any
given input bit string, the average Hamming distance of the closest input bit
string with a different classification is at least sqrt(n / (2{\pi} log n)),
where n is the length of the string. Moreover, if the bits of the initial
string are flipped randomly, the average number of flips required to change the
classification grows linearly with n. These results are confirmed by numerical
experiments on deep neural networks with two hidden layers, and settle the
conjecture stating that random deep neural networks are biased towards simple
functions. This conjecture was proposed and numerically explored in [Valle
P\'erez et al., ICLR 2019] to explain the unreasonably good generalization
properties of deep learning algorithms. The probability distribution of the
functions generated by random deep neural networks is a good choice for the
prior probability distribution in the PAC-Bayesian generalization bounds. Our
results constitute a fundamental step forward in the characterization of this
distribution, therefore contributing to the understanding of the generalization
properties of deep learning algorithms
- …