13 research outputs found
Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function
We demonstrate that in residual neural networks (ResNets) dynamical isometry
is achievable irrespectively of the activation function used. We do that by
deriving, with the help of Free Probability and Random Matrix Theories, a
universal formula for the spectral density of the input-output Jacobian at
initialization, in the large network width and depth limit. The resulting
singular value spectrum depends on a single parameter, which we calculate for a
variety of popular activation functions, by analyzing the signal propagation in
the artificial neural network. We corroborate our results with numerical
simulations of both random matrices and ResNets applied to the CIFAR-10
classification problem. Moreover, we study the consequence of this universal
behavior for the initial and late phases of the learning processes. We conclude
by drawing attention to the simple fact, that initialization acts as a
confounding factor between the choice of activation function and the rate of
learning. We propose that in ResNets this can be resolved based on our results,
by ensuring the same level of dynamical isometry at initialization
Products of Complex Rectangular and Hermitian Random Matrices
Products and sums of random matrices have seen a rapid development in the
past decade due to various analytical techniques available. Two of these are
the harmonic analysis approach and the concept of polynomial ensembles. Very
recently, it has been shown for products of real matrices with anti-symmetric
matrices of even dimension that the traditional harmonic analysis on matrix
groups developed by Harish-Chandra et al. needs to be modified when considering
the group action on general symmetric spaces of matrices. In the present work,
we consider the product of complex random matrices with Hermitian matrices, in
particular the former can be also rectangular while the latter has not to be
positive definite and is considered as a fixed matrix as well as a random
matrix. This generalises an approach for products involving the Gaussian
unitary ensemble (GUE) and circumvents the use there of non-compact group
integrals. We derive the joint probability density function of the real
eigenvalues and, additionally, prove transformation formulas for the
bi-orthogonal functions and kernels.Comment: 25 pages, v2: corrections of minor typos and an additional discussion
of Example IV.