The increasing inclusion of Deep Learning (DL) models in safety-critical
systems such as autonomous vehicles have led to the development of multiple
model-based DL testing techniques. One common denominator of these testing
techniques is the automated generation of test cases, e.g., new inputs
transformed from the original training data with the aim to optimize some test
adequacy criteria. So far, the effectiveness of these approaches has been
hindered by their reliance on random fuzzing or transformations that do not
always produce test cases with a good diversity. To overcome these limitations,
we propose, DeepEvolution, a novel search-based approach for testing DL models
that relies on metaheuristics to ensure a maximum diversity in generated test
cases. We assess the effectiveness of DeepEvolution in testing computer-vision
DL models and found that it significantly increases the neuronal coverage of
generated test cases. Moreover, using DeepEvolution, we could successfully find
several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz
(a coverage-guided fuzzing tool developed at Google Brain) in detecting latent
defects introduced during the quantization of the models. These results suggest
that search-based approaches can help build effective testing tools for DL
systems