2 research outputs found
Label-Based Diversity Measure Among Hidden Units of Deep Neural Networks: A Regularization Method
Although the deep structure guarantees the powerful expressivity of deep
networks (DNNs), it also triggers serious overfitting problem. To improve the
generalization capacity of DNNs, many strategies were developed to improve the
diversity among hidden units. However, most of these strategies are empirical
and heuristic in absence of either a theoretical derivation of the diversity
measure or a clear connection from the diversity to the generalization
capacity. In this paper, from an information theoretic perspective, we
introduce a new definition of redundancy to describe the diversity of hidden
units under supervised learning settings by formalizing the effect of hidden
layers on the generalization capacity as the mutual information. We prove an
opposite relationship existing between the defined redundancy and the
generalization capacity, i.e., the decrease of redundancy generally improving
the generalization capacity. The experiments show that the DNNs using the
redundancy as the regularizer can effectively reduce the overfitting and
decrease the generalization error, which well supports above points
Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters
In this paper, we investigate the empirical impact of orthogonality
regularization (OR) in deep learning, either solo or collaboratively. Recent
works on OR showed some promising results on the accuracy. In our ablation
study, however, we do not observe such significant improvement from existing OR
techniques compared with the conventional training based on weight decay,
dropout, and batch normalization. To identify the real gain from OR, inspired
by the locality sensitive hashing (LSH) in angle estimation, we propose to
introduce an implicit self-regularization into OR to push the mean and variance
of filter angles in a network towards 90 and 0 simultaneously to achieve (near)
orthogonality among the filters, without using any other explicit
regularization. Our regularization can be implemented as an architectural
plug-in and integrated with an arbitrary network. We reveal that OR helps
stabilize the training process and leads to faster convergence and better
generalization.Comment: This version fixed the controversial expression in Section 2.