9 research outputs found
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Deep neural networks often consist of a great number of trainable parameters
for extracting powerful features from given datasets. On one hand, massive
trainable parameters significantly enhance the performance of these deep
networks. On the other hand, they bring the problem of over-fitting. To this
end, dropout based methods disable some elements in the output feature maps
during the training phase for reducing the co-adaptation of neurons. Although
the generalization ability of the resulting models can be enhanced by these
approaches, the conventional binary dropout is not the optimal solution.
Therefore, we investigate the empirical Rademacher complexity related to
intermediate layers of deep neural networks and propose a feature distortion
method (Disout) for addressing the aforementioned problem. In the training
period, randomly selected elements in the feature maps will be replaced with
specific values by exploiting the generalization error bound. The superiority
of the proposed feature map distortion for producing deep neural network with
higher testing performance is analyzed and demonstrated on several benchmark
image datasets
Label Embedding by Johnson-Lindenstrauss Matrices
We present a simple and scalable framework for extreme multiclass
classification based on Johnson-Lindenstrauss matrices (JLMs). Using the
columns of a JLM to embed the labels, a -class classification problem is
transformed into a regression problem with \cO(\log C) output dimension. We
derive an excess risk bound, revealing a tradeoff between computational
efficiency and prediction accuracy, and further show that under the Massart
noise condition, the penalty for dimension reduction vanishes. Our approach is
easily parallelizable, and experimental results demonstrate its effectiveness
and scalability in large-scale applications
Within-layer Diversity Reduces Generalization Gap
Neural networks are composed of multiple layers arranged in a hierarchical
structure jointly trained with a gradient-based optimization, where the errors
are back-propagated from the last layer back to the first one. At each
optimization step, neurons at a given layer receive feedback from neurons
belonging to higher layers of the hierarchy. In this paper, we propose to
complement this traditional 'between-layer' feedback with additional
'within-layer' feedback to encourage diversity of the activations within the
same layer. To this end, we measure the pairwise similarity between the outputs
of the neurons and use it to model the layer's overall diversity. By penalizing
similarities and promoting diversity, we encourage each neuron to learn a
distinctive representation and, thus, to enrich the data representation learned
within the layer and to increase the total capacity of the model. We
theoretically study how the within-layer activation diversity affects the
generalization performance of a neural network and prove that increasing the
diversity of hidden activations reduces the estimation error. In addition to
the theoretical guarantees, we present an empirical study on three datasets
confirming that the proposed approach enhances the performance of
state-of-the-art neural network models and decreases the generalization gap.Comment: 18 pages, 1 figure, 3 Table