Search CORE

5 research outputs found

ICML - On the Number of Linear Regions of Convolutional Neural Networks

Author: Huang Lei
Liu Li
Shao Ling
Xiong Huan
Yu Mengyang
Zhu Fan
Publication venue: ZU Scholars
Publication date: 01/06/2020
Field of study

One fundamental problem in deep learning is understanding the outstanding performance of deep Neural Networks (NNs) in practice. One explanation for the superiority of NNs is that they can realize a large class of complicated functions, i.e., they have powerful expressivity. The expressivity of a ReLU NN can be quantified by the maximal number of linear regions it can separate its input space into. In this paper, we provide several mathematical results needed for studying the linear regions of CNNs, and use them to derive the maximal and average numbers of linear regions for one-layer ReLU CNNs. Furthermore, we obtain upper and lower bounds for the number of linear regions of multi-layer ReLU CNNs. Our results suggest that deeper CNNs have more powerful expressivity than their shallow counterparts, while CNNs have more expressivity than fully-connected NNs per parameter.Comment: International Conference on Machine Learning (ICML) 202

arXiv.org e-Print Archive

ZU Scholars (Zayed University)

Theoretical Deep Learning

Author: He Fengxiang
Publication venue: Faculty of Engineering, School of Computer Science
Publication date: 01/01/2021
Field of study

Deep learning has long been criticised as a black-box model for lacking sound theoretical explanation. During the PhD course, I explore and establish theoretical foundations for deep learning. In this thesis, I present my contributions positioned upon existing literature: (1) analysing the generalizability of the neural networks with residual connections via complexity and capacity-based hypothesis complexity measures; (2) modeling stochastic gradient descent (SGD) by stochastic differential equations (SDEs) and their dynamics, and further characterizing the generalizability of deep learning; (3) understanding the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems, which sheds light in reconciling the over-representation and excellent generalizability of deep learning; and (4) discovering the interplay between generalization, privacy preservation, and adversarial robustness, which have seen rising concerns in deep learning deployment

Sydney eScholarship