First-order gradient descent has been the base of the most successful
optimization algorithms ever implemented. On supervised learning problems with
very high dimensionality, such as neural network optimization, it is almost
always the algorithm of choice, mainly due to its memory and computational
efficiency. However, it is a classical result in optimization that gradient
descent converges to local minima on non-convex functions. Even more
importantly, in certain high-dimensional cases, escaping the plateaus of large
saddle points becomes intractable. On the other hand, black-box optimization
methods are not sensitive to the local structure of a loss function's landscape
but suffer the curse of dimensionality. Instead, memetic algorithms aim to
combine the benefits of both. Inspired by this, we present Population Descent,
a memetic algorithm focused on hyperparameter optimization. We show that an
adaptive m-elitist selection approach combined with a normalized-fitness-based
randomization scheme outperforms more complex state-of-the-art algorithms by up
to 13% on common benchmark tasks