4 research outputs found
Evolutionary Optimization of Deep Learning Activation Functions
The choice of activation function can have a large effect on the performance
of a neural network. While there have been some attempts to hand-engineer novel
activation functions, the Rectified Linear Unit (ReLU) remains the most
commonly-used in practice. This paper shows that evolutionary algorithms can
discover novel activation functions that outperform ReLU. A tree-based search
space of candidate activation functions is defined and explored with mutation,
crossover, and exhaustive search. Experiments on training wide residual
networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach
is effective. Replacing ReLU with evolved activation functions results in
statistically significant increases in network accuracy. Optimal performance is
achieved when evolution is allowed to customize activation functions to a
particular task; however, these novel activation functions are shown to
generalize, achieving high performance across tasks. Evolutionary optimization
of activation functions is therefore a promising new dimension of metalearning
in neural networks.Comment: 8 pages; 9 figures/tables; GECCO 202
Discovering Parametric Activation Functions
Recent studies have shown that the choice of activation function can
significantly affect the performance of deep learning networks. However, the
benefits of novel activation functions have been inconsistent and task
dependent, and therefore the rectified linear unit (ReLU) is still the most
commonly used. This paper proposes a technique for customizing activation
functions automatically, resulting in reliable improvements in performance.
Evolutionary search is used to discover the general form of the function, and
gradient descent to optimize its parameters for different parts of the network
and over the learning process. Experiments with four different neural network
architectures on the CIFAR-10 and CIFAR-100 image classification datasets show
that this approach is effective. It discovers both general activation functions
and specialized functions for different architectures, consistently improving
accuracy over ReLU and other activation functions by significant margins. The
approach can therefore be used as an automated optimization step in applying
deep learning to new tasks.Comment: 14 pages, 12 figures/tables, under revie
Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization
Metalearning of deep neural network (DNN) architectures and hyperparameters
has become an increasingly important area of research. Loss functions are a
type of metaknowledge that is crucial to effective training of DNNs, however,
their potential role in metalearning has not yet been fully explored. Whereas
early work focused on genetic programming (GP) on tree representations, this
paper proposes continuous CMA-ES optimization of multivariate Taylor polynomial
parameterizations. This approach, TaylorGLO, makes it possible to represent and
search useful loss functions more effectively. In MNIST, CIFAR-10, and SVHN
benchmark tasks, TaylorGLO finds new loss functions that outperform functions
previously discovered through GP, as well as the standard cross-entropy loss,
in fewer generations. These functions serve to regularize the learning task by
discouraging overfitting to the labels, which is particularly useful in tasks
where limited training data is available. The results thus demonstrate that
loss function optimization is a productive new avenue for metalearning
Lights and Shadows in Evolutionary Deep Learning: Taxonomy, Critical Methodological Analysis, Cases of Study, Learned Lessons, Recommendations and Challenges
Much has been said about the fusion of bio-inspired optimization algorithms
and Deep Learning models for several purposes: from the discovery of network
topologies and hyper-parametric configurations with improved performance for a
given task, to the optimization of the model's parameters as a replacement for
gradient-based solvers. Indeed, the literature is rich in proposals showcasing
the application of assorted nature-inspired approaches for these tasks. In this
work we comprehensively review and critically examine contributions made so far
based on three axes, each addressing a fundamental question in this research
avenue: a) optimization and taxonomy (Why?), including a historical
perspective, definitions of optimization problems in Deep Learning, and a
taxonomy associated with an in-depth analysis of the literature, b) critical
methodological analysis (How?), which together with two case studies, allows us
to address learned lessons and recommendations for good practices following the
analysis of the literature, and c) challenges and new directions of research
(What can be done, and what for?). In summary, three axes - optimization and
taxonomy, critical analysis, and challenges - which outline a complete vision
of a merger of two technologies drawing up an exciting future for this area of
fusion research.Comment: 64 pages, 18 figures, under review for its consideration in
Information Fusion journa