16 research outputs found
PBIL for Optimizing Hyperparameters of Convolutional Neural Networks and STL Decomposition
The optimization of hyperparameters in Deep Neural Net-works is a
critical task for the final performance, but it involves a high amount of subjective
decisions based on previous researchers’ expertise. This paper presents the
implementation of Population-based Incremen-tal Learning for the automatic
optimization of hyperparameters in Deep Learning architectures. Namely, the
proposed architecture is a combina-tion of preprocessing the time series input with
Seasonal Decomposition of Time Series by Loess, a classical method for decomposing
time series, and forecasting with Convolutional Neural Networks. In the past, this
combination has produced promising results, but penalized by an incre-mental
number of parameters. The proposed architecture is applied to the prediction of the
222Rn level at the Canfranc Underground Labora-tory (Spain). By predicting the lowlevel
periods of 222Rn, the potential contamination during the maintenance
operations in the experiments hosted in the laboratory could be minimized. In this
paper, it is shown that Population-based Incremental Learning can be used for the
choice of optimized hyperparameters in Deep Learning architectures with a reasonable
computational cost.Ministerio de Economía y Competitividad MDM- 2015-050
AutoRL Hyperparameter Landscapes
Although Reinforcement Learning (RL) has shown to be capable of producing
impressive results, its use is limited by the impact of its hyperparameters on
performance. This often makes it difficult to achieve good results in practice.
Automated RL (AutoRL) addresses this difficulty, yet little is known about the
dynamics of the hyperparameter landscapes that hyperparameter optimization
(HPO) methods traverse in search of optimal configurations. In view of existing
AutoRL approaches dynamically adjusting hyperparameter configurations, we
propose an approach to build and analyze these hyperparameter landscapes not
just for one point in time but at multiple points in time throughout training.
Addressing an important open question on the legitimacy of such dynamic AutoRL
approaches, we provide thorough empirical evidence that the hyperparameter
landscapes strongly vary over time across representative algorithms from RL
literature (DQN and SAC) in different kinds of environments (Cartpole and
Hopper). This supports the theory that hyperparameters should be dynamically
adjusted during training and shows the potential for more insights on AutoRL
problems that can be gained through landscape analyses
Generating Diverse Teammates to Train Robust Agents For Ad Hoc Teamwork
Ad hoc teamwork (AHT) is the challenge of designing a learner that
effectively collaborates with unknown teammates without prior coordination
mechanisms. Early approaches address the AHT challenge by training the learner
with a diverse set of handcrafted teammate policies, usually designed based on
an expert's domain knowledge about the policies the learner may encounter.
However, implementing teammate policies for training based on domain knowledge
is not always feasible. In such cases, recent approaches attempted to improve
the robustness of the learner by training it with teammate policies generated
by optimising information-theoretic diversity metrics. However, optimising
information-theoretic diversity metrics may generate teammates with
superficially different behaviours, which does not necessarily result in a
robust learner that can effectively collaborate with unknown teammates. In this
paper, we present an automated teammate policy generation method optimising the
Best-Response Diversity (BRDiv) metric, which measures diversity based on the
compatibility of teammate policies in terms of returns. We evaluate our
approach in environments with multiple valid coordination strategies, comparing
against methods optimising information-theoretic diversity metrics and an
ablation not optimising any diversity metric. Our experiments indicate that
optimising BRDiv yields a diverse set of training teammate policies that
improve the learner's performance relative to previous teammate generation
approaches when collaborating with near-optimal previously unseen teammate
policies
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
Despite a series of recent successes in reinforcement learning (RL), many RL
algorithms remain sensitive to hyperparameters. As such, there has recently
been interest in the field of AutoRL, which seeks to automate design decisions
to create more general algorithms. Recent work suggests that population based
approaches may be effective AutoRL algorithms, by learning hyperparameter
schedules on the fly. In particular, the PB2 algorithm is able to achieve
strong performance in RL tasks by formulating online hyperparameter
optimization as time varying GP-bandit problem, while also providing
theoretical guarantees. However, PB2 is only designed to work for continuous
hyperparameters, which severely limits its utility in practice. In this paper
we introduce a new (provably) efficient hierarchical approach for optimizing
both continuous and categorical variables, using a new time-varying bandit
algorithm specifically designed for the population based training regime. We
evaluate our approach on the challenging Procgen benchmark, where we show that
explicitly modelling dependence between data augmentation and other
hyperparameters improves generalization