Search CORE

93 research outputs found

Online Meta-learning by Parallel Algorithm Competition

Author: Baker James E.
Bertsekas D. P.
Downey Carlton
Gabillon V.
Goodfellow Ian
Mnih Volodymyr
Snoek Jasper
Snoek Jasper
Springenberg Jost T.
Sutton S.
Sutton S.
Szita I.
Unemi T.
Wu Jian
Publication venue
Publication date: 24/02/2017
Field of study

The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in standard Tetris with a smaller, 10

\times

10, board, by 31% and 84%, respectively, and by improving the results for deep Sarsa(

\lambda

) agents in three Atari 2600 games by 62% or more. The experiments also show the ability of the OMPAC method to adapt the meta-parameters according to the learning progress in different tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.0311

arXiv.org e-Print Archive

Crossref

Deep Reinforcement Learning and its application to games

Author: Torrejón Valenzuela Alberto
Publication venue
Publication date: 01/08/2021
Field of study

The following project aims is to review the main concepts of Reinforcement Learning and combine them with the tools of Deep Learning, studying in depth the application of these methodologies, the Deep Reinforcement Learning algorithms, that are having such an impact today being applied to numerous fields such as autonomous driving, robot control, gaming and many more. In order to do this, first, in chapter 1, we will give a general overview of Deep Reinforcement Learning as a introduction, as well as which is motivation to study this topic. Then, in chapter 2, since it will be fundamental to achieve our goal, we give a brief review of Deep Learning. We get into details with chapter 3, where we define Reinforcement Learning mathematically, formalizing the concepts in order to build the classic solution algorithms in chapter 4. As an application of these techniques, the implementation of the algorithms for the game of Blackjack is presented in chapter 5. Finally, in chapter 6, we reach our initial objective by building the algorithm that hides behind the Deep Q-Networks and we apply it to the Gridworld games in chapter 7. A conclusions and improvements section for the project culminates the text.El siguiente proyecto tiene como objetivo revisar los principales conceptos del Aprendizaje con Refuerzo y combinarlo con las herramientas del Aprendizaje Profundo, estudiando con detalle la aplicación de estas metodologías, Aprendizaje con Refuerzo Profundo, que están teniendo tanto impacto en la actualidad siendo aplicados a numerosos campos como la conducción autónoma, el control de robots, juegos y muchos más. Para ello, en primer lugar, en el capítulo 1, situaremos al Aprendizaje con Refuerzo Profundo a modo de introducción, motivando el estudio de este campo. Acto seguido, en el capítulo 2, ya que será fundamental para lograr nuestro objetivo, se realiza una breve revisión del Aprendizaje Profundo. Entraremos en detalles con el capítulo 3, donde definiremos matemáticamente que se entiende Aprendizaje con Refuerzo, formalizando los conceptos con el fin de construir los algoritmos de solución clásicos en el capítulo 4. Como aplicación de estas técnicas, en el capítulo 5 se presenta la implementación de los algoritmos para el juego del Blackjack. Finalmente, en el capítulo 5, alcanzaremos nuestro objetivo inicial construyendo el algoritmo detrás de las Deep Q-Networks y lo aplicamos a los juegos Gridworld en capítulo 7. Una sección de conclusiones y mejoras para el proyecto culmina el texto.Universidad de Sevilla. Grado en Matemáticas y Estadístic

idUS. Depósito de Investigación Universidad de Sevilla

A REINFORCEMENT LEARNING APPROACH TO VEHICLE PATH OPTIMIZATION IN URBAN ENVIRONMENTS

Author: Al Hassani Shamsa Abdulla
Publication venue: Scholarworks@UAEU
Publication date: 01/06/2021
Field of study

Road traffic management in metropolitan cities and urban areas, in general, is an important component of Intelligent Transportation Systems (ITS). With the increasing number of world population and vehicles, a dramatic increase in road traffic is expected to put pressure on the transportation infrastructure. Therefore, there is a pressing need to devise new ways to optimize the traffic flow in order to accommodate the growing needs of transportation systems. This work proposes to use an Artificial Intelligent (AI) method based on reinforcement learning techniques for computing near-optimal vehicle itineraries applied to Vehicular Ad-hoc Networks (VANETs). These itineraries are optimized based on the vehicle’s travel distance, travel time, and traffic road congestion. The problem of traffic density is formulated as a Markov Decision Process (MDP). In particular, this work introduces a new reward function that takes into account the traffic congestion when learning about the vehicle’s best action (best turn) to take in different situations. To learn the effect of this approach, the work investigated different learning algorithms such as Q-Learning and SARSA in conjunction with two exploration strategies: (a) e-greedy and (b) Softmax. A comparative performance study of these methods is presented to determine the most effective solution that enables the vehicles to find a fast and reliable path. Simulation experiments illustrate the effectiveness of proposed methods in computing optimal itineraries allowing vehicles to avoid traffic congestion while maintaining reasonable travel times and distances

United Arab Emirates University: Scholarworks@UAEU / جامعة الامارات