7 research outputs found

    From Simulation to Real World Maneuver Execution using Deep Reinforcement Learning

    Full text link
    Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios. This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets. In this work, we investigate these problems in the autonomous driving field, especially for a maneuver planning module for roundabout insertions. In particular, we present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios. Finally, we analyze techniques aimed at reducing the gap between simulated and real-world data showing that this increased the generalization capabilities of the system both on unseen and real-world scenarios.Comment: Intelligent Vehicle Symposium 2020 (IV2020

    Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL

    Full text link
    Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the fly. In particular, the PB2 algorithm is able to achieve strong performance in RL tasks by formulating online hyperparameter optimization as time varying GP-bandit problem, while also providing theoretical guarantees. However, PB2 is only designed to work for continuous hyperparameters, which severely limits its utility in practice. In this paper we introduce a new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime. We evaluate our approach on the challenging Procgen benchmark, where we show that explicitly modelling dependence between data augmentation and other hyperparameters improves generalization

    Video game Design and Development Degree Technical Report of the Final Degree Project

    Get PDF
    Treball final de Grau en Disseny i Desenvolupament de Videojocs. Codi: VJ1241. Curs acadèmic: 2017/2018This section is the technical proposal of my Final Degree Project in Video game Design and Development. The project consists of the development of a 3D Roguelike game using Unity3D. The main features are the design of a procedural dungeon generation system and the use of Machine Learning and Artificial Neural Networks for the NPCs (non-player characters) behavior. This artificial intelligence will be implemented with the new Machine Learning agents of Unity3D
    corecore