2 research outputs found
Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization
As the complexity of neural network models has grown, it has become
increasingly important to optimize their design automatically through
metalearning. Methods for discovering hyperparameters, topologies, and learning
rate schedules have lead to significant increases in performance. This paper
shows that loss functions can be optimized with metalearning as well, and
result in similar improvements. The method, Genetic Loss-function Optimization
(GLO), discovers loss functions de novo, and optimizes them for a target task.
Leveraging techniques from genetic programming, GLO builds loss functions
hierarchically from a set of operators and leaf nodes. These functions are
repeatedly recombined and mutated to find an optimal structure, and then a
covariance-matrix adaptation evolutionary strategy (CMA-ES) is used to find
optimal coefficients. Networks trained with GLO loss functions are found to
outperform the standard cross-entropy loss on standard image classification
tasks. Training with these new loss functions requires fewer steps, results in
lower test error, and allows for smaller datasets to be used. Loss-function
optimization thus provides a new dimension of metalearning, and constitutes an
important step towards AutoML
Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization
Metalearning of deep neural network (DNN) architectures and hyperparameters
has become an increasingly important area of research. Loss functions are a
type of metaknowledge that is crucial to effective training of DNNs, however,
their potential role in metalearning has not yet been fully explored. Whereas
early work focused on genetic programming (GP) on tree representations, this
paper proposes continuous CMA-ES optimization of multivariate Taylor polynomial
parameterizations. This approach, TaylorGLO, makes it possible to represent and
search useful loss functions more effectively. In MNIST, CIFAR-10, and SVHN
benchmark tasks, TaylorGLO finds new loss functions that outperform functions
previously discovered through GP, as well as the standard cross-entropy loss,
in fewer generations. These functions serve to regularize the learning task by
discouraging overfitting to the labels, which is particularly useful in tasks
where limited training data is available. The results thus demonstrate that
loss function optimization is a productive new avenue for metalearning