Reinforcement learning (RL) offers the potential for training generally
capable agents that can interact autonomously in the real world. However, one
key limitation is the brittleness of RL algorithms to core hyperparameters and
network architecture choice. Furthermore, non-stationarities such as evolving
training data and increased agent complexity mean that different
hyperparameters and architectures may be optimal at different points of
training. This motivates AutoRL, a class of methods seeking to automate these
design choices. One prominent class of AutoRL methods is Population-Based
Training (PBT), which have led to impressive performance in several large scale
settings. In this paper, we introduce two new innovations in PBT-style methods.
First, we employ trust-region based Bayesian Optimization, enabling full
coverage of the high-dimensional mixed hyperparameter search space. Second, we
show that using a generational approach, we can also learn both architectures
and hyperparameters jointly on-the-fly in a single training run. Leveraging the
new highly parallelizable Brax physics engine, we show that these innovations
lead to large performance gains, significantly outperforming the tuned baseline
while learning entire configurations on the fly. Code is available at
https://github.com/xingchenwan/bgpbt.Comment: AutoML Conference 2022. 10 pages, 4 figure, 3 tables (28 pages, 10
figures, 7 tables including references and appendices