7 research outputs found
From Simulation to Real World Maneuver Execution using Deep Reinforcement Learning
Deep Reinforcement Learning has proved to be able to solve many control tasks
in different fields, but the behavior of these systems is not always as
expected when deployed in real-world scenarios. This is mainly due to the lack
of domain adaptation between simulated and real-world data together with the
absence of distinction between train and test datasets. In this work, we
investigate these problems in the autonomous driving field, especially for a
maneuver planning module for roundabout insertions. In particular, we present a
system based on multiple environments in which agents are trained
simultaneously, evaluating the behavior of the model in different scenarios.
Finally, we analyze techniques aimed at reducing the gap between simulated and
real-world data showing that this increased the generalization capabilities of
the system both on unseen and real-world scenarios.Comment: Intelligent Vehicle Symposium 2020 (IV2020
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
Despite a series of recent successes in reinforcement learning (RL), many RL
algorithms remain sensitive to hyperparameters. As such, there has recently
been interest in the field of AutoRL, which seeks to automate design decisions
to create more general algorithms. Recent work suggests that population based
approaches may be effective AutoRL algorithms, by learning hyperparameter
schedules on the fly. In particular, the PB2 algorithm is able to achieve
strong performance in RL tasks by formulating online hyperparameter
optimization as time varying GP-bandit problem, while also providing
theoretical guarantees. However, PB2 is only designed to work for continuous
hyperparameters, which severely limits its utility in practice. In this paper
we introduce a new (provably) efficient hierarchical approach for optimizing
both continuous and categorical variables, using a new time-varying bandit
algorithm specifically designed for the population based training regime. We
evaluate our approach on the challenging Procgen benchmark, where we show that
explicitly modelling dependence between data augmentation and other
hyperparameters improves generalization
Video game Design and Development Degree Technical Report of the Final Degree Project
Treball final de Grau en Disseny i Desenvolupament de Videojocs. Codi: VJ1241. Curs acadèmic: 2017/2018This section is the technical proposal of my Final Degree Project in Video game Design and
Development. The project consists of the development of a 3D Roguelike game using
Unity3D. The main features are the design of a procedural dungeon generation system and the
use of Machine Learning and Artificial Neural Networks for the NPCs (non-player characters)
behavior. This artificial intelligence will be implemented with the new Machine Learning
agents of Unity3D