Search CORE

7 research outputs found

From Simulation to Real World Maneuver Execution using Deep Reinforcement Learning

Author: Bacchiani Giulio
Broggi Alberto
Capasso Alessandro Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/12/2020
Field of study

Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios. This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets. In this work, we investigate these problems in the autonomous driving field, especially for a maneuver planning module for roundabout insertions. In particular, we present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios. Finally, we analyze techniques aimed at reducing the gap between simulated and real-world data showing that this increased the generalization capabilities of the system both on unseen and real-world scenarios.Comment: Intelligent Vehicle Symposium 2020 (IV2020

arXiv.org e-Print Archive

Crossref

Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL

Author: Desai Shaan
Nguyen Vu
Parker-Holder Jack
Roberts Stephen
Publication venue
Publication date: 30/06/2021
Field of study

Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the fly. In particular, the PB2 algorithm is able to achieve strong performance in RL tasks by formulating online hyperparameter optimization as time varying GP-bandit problem, while also providing theoretical guarantees. However, PB2 is only designed to work for continuous hyperparameters, which severely limits its utility in practice. In this paper we introduce a new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime. We evaluate our approach on the challenging Procgen benchmark, where we show that explicitly modelling dependence between data augmentation and other hyperparameters improves generalization

arXiv.org e-Print Archive

Oxford University Research Archive

Video game Design and Development Degree Technical Report of the Final Degree Project

Author: Pinilla Bermejo Andoni
Publication venue: 'Universitat Jaume I'
Publication date: 01/07/2018
Field of study

Treball final de Grau en Disseny i Desenvolupament de Videojocs. Codi: VJ1241. Curs acadèmic: 2017/2018This section is the technical proposal of my Final Degree Project in Video game Design and Development. The project consists of the development of a 3D Roguelike game using Unity3D. The main features are the design of a procedural dungeon generation system and the use of Machine Learning and Artificial Neural Networks for the NPCs (non-player characters) behavior. This artificial intelligence will be implemented with the new Machine Learning agents of Unity3D

Repositori Institucional de la Universitat Jaume I