84,893 research outputs found
ErfReLU: Adaptive Activation Function for Deep Neural Network
Recent research has found that the activation function (AF) selected for
adding non-linearity into the output can have a big impact on how effectively
deep learning networks perform. Developing activation functions that can adapt
simultaneously with learning is a need of time. Researchers recently started
developing activation functions that can be trained throughout the learning
process, known as trainable, or adaptive activation functions (AAF). Research
on AAF that enhance the outcomes is still in its early stages. In this paper, a
novel activation function 'ErfReLU' has been developed based on the erf
function and ReLU. This function exploits the ReLU and the error function (erf)
to its advantage. State of art activation functions like Sigmoid, ReLU, Tanh,
and their properties have been briefly explained. Adaptive activation functions
like Tanhsoft1, Tanhsoft2, Tanhsoft3, TanhLU, SAAF, ErfAct, Pserf, Smish, and
Serf have also been described. Lastly, performance analysis of 9 trainable
activation functions along with the proposed one namely Tanhsoft1, Tanhsoft2,
Tanhsoft3, TanhLU, SAAF, ErfAct, Pserf, Smish, and Serf has been shown by
applying these activation functions in MobileNet, VGG16, and ResNet models on
CIFAR-10, MNIST, and FMNIST benchmark datasets
Adaptive Estimators Show Information Compression in Deep Neural Networks
To improve how neural networks function it is crucial to understand their
learning process. The information bottleneck theory of deep learning proposes
that neural networks achieve good generalization by compressing their
representations to disregard information that is not relevant to the task.
However, empirical evidence for this theory is conflicting, as compression was
only observed when networks used saturating activation functions. In contrast,
networks with non-saturating activation functions achieved comparable levels of
task performance but did not show compression. In this paper we developed more
robust mutual information estimation techniques, that adapt to hidden activity
of neural networks and produce more sensitive measurements of activations from
all functions, especially unbounded functions. Using these adaptive estimation
techniques, we explored compression in networks with a range of different
activation functions. With two improved methods of estimation, firstly, we show
that saturation of the activation function is not required for compression, and
the amount of compression varies between different activation functions. We
also find that there is a large amount of variation in compression between
different network initializations. Secondary, we see that L2 regularization
leads to significantly increased compression, while preventing overfitting.
Finally, we show that only compression of the last layer is positively
correlated with generalization.Comment: Accepted as a poster presentation at ICLR 2019 and reviewed on
OpenReview (available at https://openreview.net/forum?id=SkeZisA5t7). Pages:
11. Figures:
- …