One potential drawback of using aggregated performance measurement in machine
learning is that models may learn to accept higher errors on some training
cases as compromises for lower errors on others, with the lower errors actually
being instances of overfitting. This can lead to both stagnation at local
optima and poor generalization. Lexicase selection is an uncompromising method
developed in evolutionary computation, which selects models on the basis of
sequences of individual training case errors instead of using aggregated
metrics such as loss and accuracy. In this paper, we investigate how lexicase
selection, in its general form, can be integrated into the context of deep
learning to enhance generalization. We propose Gradient Lexicase Selection, an
optimization framework that combines gradient descent and lexicase selection in
an evolutionary fashion. Our experimental results demonstrate that the proposed
method improves the generalization performance of various widely-used deep
neural network architectures across three image classification benchmarks.
Additionally, qualitative analysis suggests that our method assists networks in
learning more diverse representations. Our source code is available on GitHub:
https://github.com/ld-ing/gradient-lexicase.Comment: ICLR 202