Search CORE

2 research outputs found

Label-Based Diversity Measure Among Hidden Units of Deep Neural Networks: A Regularization Method

Author: Ge Liangzhu
Hou Yuexian
Song Dawei
Yao Yaoshuai
Zhang Chenguang
Publication venue
Publication date: 03/04/2021
Field of study

Although the deep structure guarantees the powerful expressivity of deep networks (DNNs), it also triggers serious overfitting problem. To improve the generalization capacity of DNNs, many strategies were developed to improve the diversity among hidden units. However, most of these strategies are empirical and heuristic in absence of either a theoretical derivation of the diversity measure or a clear connection from the diversity to the generalization capacity. In this paper, from an information theoretic perspective, we introduce a new definition of redundancy to describe the diversity of hidden units under supervised learning settings by formalizing the effect of hidden layers on the generalization capacity as the mutual information. We prove an opposite relationship existing between the defined redundancy and the generalization capacity, i.e., the decrease of redundancy generally improving the generalization capacity. The experiments show that the DNNs using the redundancy as the regularizer can effectively reduce the overfitting and decrease the generalization error, which well supports above points

arXiv.org e-Print Archive

Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters

Author: Ma Wenchi
Wang Guanghui
Wu Yuanwei
Zhang Ziming
Publication venue
Publication date: 17/01/2020
Field of study

In this paper, we investigate the empirical impact of orthogonality regularization (OR) in deep learning, either solo or collaboratively. Recent works on OR showed some promising results on the accuracy. In our ablation study, however, we do not observe such significant improvement from existing OR techniques compared with the conventional training based on weight decay, dropout, and batch normalization. To identify the real gain from OR, inspired by the locality sensitive hashing (LSH) in angle estimation, we propose to introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously to achieve (near) orthogonality among the filters, without using any other explicit regularization. Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network. We reveal that OR helps stabilize the training process and leads to faster convergence and better generalization.Comment: This version fixed the controversial expression in Section 2.

arXiv.org e-Print Archive