10 research outputs found
Learning activation functions from data using cubic spline interpolation
Neural networks require a careful design in order to perform properly on a
given task. In particular, selecting a good activation function (possibly in a
data-dependent fashion) is a crucial step, which remains an open problem in the
research community. Despite a large amount of investigations, most current
implementations simply select one fixed function from a small set of
candidates, which is not adapted during training, and is shared among all
neurons throughout the different layers. However, neither two of these
assumptions can be supposed optimal in practice. In this paper, we present a
principled way to have data-dependent adaptation of the activation functions,
which is performed independently for each neuron. This is achieved by
leveraging over past and present advances on cubic spline interpolation,
allowing for local adaptation of the functions around their regions of use. The
resulting algorithm is relatively cheap to implement, and overfitting is
counterbalanced by the inclusion of a novel damping criterion, which penalizes
unwanted oscillations from a predefined shape. Experimental results validate
the proposal over two well-known benchmarks.Comment: Submitted to the 27th Italian Workshop on Neural Networks (WIRN 2017
Universal Activation Function For Machine Learning
This article proposes a Universal Activation Function (UAF) that achieves
near optimal performance in quantification, classification, and reinforcement
learning (RL) problems. For any given problem, the optimization algorithms are
able to evolve the UAF to a suitable activation function by tuning the UAF's
parameters. For the CIFAR-10 classification and VGG-8, the UAF converges to the
Mish like activation function, which has near optimal performance when compared to other activation functions. For the
quantification of simulated 9-gas mixtures in 30 dB signal-to-noise ratio (SNR)
environments, the UAF converges to the identity function, which has near
optimal root mean square error of . In the
BipedalWalker-v2 RL dataset, the UAF achieves the 250 reward in
epochs, which proves that the UAF converges in the lowest number of epochs.
Furthermore, the UAF converges to a new activation function in the
BipedalWalker-v2 RL dataset
A survey on modern trainable activation functions
In neural networks literature, there is a strong interest in identifying and
defining activation functions which can improve neural network performance. In
recent years there has been a renovated interest of the scientific community in
investigating activation functions which can be trained during the learning
process, usually referred to as "trainable", "learnable" or "adaptable"
activation functions. They appear to lead to better network performance.
Diverse and heterogeneous models of trainable activation function have been
proposed in the literature. In this paper, we present a survey of these models.
Starting from a discussion on the use of the term "activation function" in
literature, we propose a taxonomy of trainable activation functions, highlight
common and distinctive proprieties of recent and past models, and discuss main
advantages and limitations of this type of approach. We show that many of the
proposed approaches are equivalent to adding neuron layers which use fixed
(non-trainable) activation functions and some simple local rule that
constraints the corresponding weight layers.Comment: Published in "Neural Networks" journal (Elsevier
The Adaptive Quadratic Linear Unit (AQuLU): Adaptive Non Monotonic Piecewise Activation Function
The activation function plays a key role in influencing the performance and training dynamics of neural networks. There are hundreds of activation functions widely used as rectified linear units (ReLUs), but most of them are applied to complex and large neural networks, which often have gradient explosion and vanishing gradient problems. By studying a variety of non-monotonic activation functions, we propose a method to construct a non-monotonic activation function, x·Φ(x), with Φ(x) [0, 1]. With the hardening treatment of Φ(x), we propose an adaptive non-monotonic segmented activation function, called the adaptive quadratic linear unit, abbreviated as AQuLU, which ensures the sparsity of the input data and improves training efficiency. In image classification based on different state-of-the-art neural network architectures, the performance of AQuLUs has significant advantages for more complex and deeper architectures with various activation functions. The ablation experimental study further validates the compatibility and stability of AQuLUs with different depths, complexities, optimizers, learning rates, and batch sizes. We thus demonstrate the high efficiency, robustness, and simplicity of AQuLUs
Improving classification models with context knowledge and variable activation functions
This work proposes two methods to boost the performances of a given classifier: the first one, which works on a Neural Network classifier, is a new type of trainable activation function, that is a function which is adjusted during the learning phase, allowing the network to exploit the data better respect to use a classic activation function with fixed-shape; the second one provides two frameworks to use an external knowledge base to improve the classification results
Learning activation functions from data using cubic spline interpolation
Neural networks require a careful design in order to perform properly on a given task. In particular, selecting a good activation function (possibly in a data-dependent fashion) is a crucial step, which remains an open problem in the research community. Despite a large amount of investigations, most current implementations simply select one fixed function from a small set of candidates, which is not adapted during training, and is shared among all neurons throughout the different layers. However, neither two of these assumptions can be supposed optimal in practice. In this paper, we present a principled way to have data-dependent adaptation of the activation functions, which is performed independently for each neuron. This is achieved by leveraging over past and present advances on cubic spline interpolation, allowing for local adaptation of the functions around their regions of use. The resulting algorithm is relatively cheap to implement, and overfitting is counterbalanced by the inclusion of a novel damping criterion, which penalizes unwanted oscillations from a predefined shape. Preliminary experimental results validate the proposal