13 research outputs found

    Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

    Full text link
    We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization

    Products of Complex Rectangular and Hermitian Random Matrices

    Full text link
    Products and sums of random matrices have seen a rapid development in the past decade due to various analytical techniques available. Two of these are the harmonic analysis approach and the concept of polynomial ensembles. Very recently, it has been shown for products of real matrices with anti-symmetric matrices of even dimension that the traditional harmonic analysis on matrix groups developed by Harish-Chandra et al. needs to be modified when considering the group action on general symmetric spaces of matrices. In the present work, we consider the product of complex random matrices with Hermitian matrices, in particular the former can be also rectangular while the latter has not to be positive definite and is considered as a fixed matrix as well as a random matrix. This generalises an approach for products involving the Gaussian unitary ensemble (GUE) and circumvents the use there of non-compact group integrals. We derive the joint probability density function of the real eigenvalues and, additionally, prove transformation formulas for the bi-orthogonal functions and kernels.Comment: 25 pages, v2: corrections of minor typos and an additional discussion of Example IV.
    corecore