3 research outputs found
AutoRL Hyperparameter Landscapes
Although Reinforcement Learning (RL) has shown to be capable of producing
impressive results, its use is limited by the impact of its hyperparameters on
performance. This often makes it difficult to achieve good results in practice.
Automated RL (AutoRL) addresses this difficulty, yet little is known about the
dynamics of the hyperparameter landscapes that hyperparameter optimization
(HPO) methods traverse in search of optimal configurations. In view of existing
AutoRL approaches dynamically adjusting hyperparameter configurations, we
propose an approach to build and analyze these hyperparameter landscapes not
just for one point in time but at multiple points in time throughout training.
Addressing an important open question on the legitimacy of such dynamic AutoRL
approaches, we provide thorough empirical evidence that the hyperparameter
landscapes strongly vary over time across representative algorithms from RL
literature (DQN and SAC) in different kinds of environments (Cartpole and
Hopper). This supports the theory that hyperparameters should be dynamically
adjusted during training and shows the potential for more insights on AutoRL
problems that can be gained through landscape analyses
PI is back! Switching Acquisition Functions in Bayesian Optimization
Bayesian Optimization (BO) is a powerful, sample-efficient technique to
optimize expensive-to-evaluate functions. Each of the BO components, such as
the surrogate model, the acquisition function (AF), or the initial design, is
subject to a wide range of design choices. Selecting the right components for a
given optimization task is a challenging task, which can have significant
impact on the quality of the obtained results. In this work, we initiate the
analysis of which AF to favor for which optimization scenarios. To this end, we
benchmark SMAC3 using Expected Improvement (EI) and Probability of Improvement
(PI) as acquisition functions on the 24 BBOB functions of the COCO environment.
We compare their results with those of schedules switching between AFs. One
schedule aims to use EI's explorative behavior in the early optimization steps,
and then switches to PI for a better exploitation in the final steps. We also
compare this to a random schedule and round-robin selection of EI and PI. We
observe that dynamic schedules oftentimes outperform any single static one. Our
results suggest that a schedule that allocates the first 25 % of the
optimization budget to EI and the last 75 % to PI is a reliable default.
However, we also observe considerable performance differences for the 24
functions, suggesting that a per-instance allocation, possibly learned on the
fly, could offer significant improvement over the state-of-the-art BO designs.Comment: 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling,
and Decision-making System
Contextualize Me -- The Case for Context in Reinforcement Learning
While Reinforcement Learning ( RL) has made great strides towards solving
increasingly complicated problems, many algorithms are still brittle to even
slight environmental changes. Contextual Reinforcement Learning (cRL) provides
a framework to model such changes in a principled manner, thereby enabling
flexible, precise and interpretable task specification and generation. Our goal
is to show how the framework of cRL contributes to improving zero-shot
generalization in RL through meaningful benchmarks and structured reasoning
about generalization tasks. We confirm the insight that optimal behavior in cRL
requires context information, as in other related areas of partial
observability. To empirically validate this in the cRL framework, we provide
various context-extended versions of common RL environments. They are part of
the first benchmark library, CARL, designed for generalization based on cRL
extensions of popular benchmarks, which we propose as a testbed to further
study general agents. We show that in the contextual setting, even simple RL
environments become challenging - and that naive solutions are not enough to
generalize across complex context spaces.Comment: arXiv admin note: substantial text overlap with arXiv:2110.0210