93 research outputs found
Online Meta-learning by Parallel Algorithm Competition
The efficiency of reinforcement learning algorithms depends critically on a
few meta-parameters that modulates the learning updates and the trade-off
between exploration and exploitation. The adaptation of the meta-parameters is
an open question in reinforcement learning, which arguably has become more of
an issue recently with the success of deep reinforcement learning in
high-dimensional state spaces. The long learning times in domains such as Atari
2600 video games makes it not feasible to perform comprehensive searches of
appropriate meta-parameter values. We propose the Online Meta-learning by
Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several
instances of a reinforcement learning algorithm are run in parallel with small
differences in the initial values of the meta-parameters. After a fixed number
of episodes, the instances are selected based on their performance in the task
at hand. Before continuing the learning, Gaussian noise is added to the
meta-parameters with a predefined probability. We validate the OMPAC method by
improving the state-of-the-art results in stochastic SZ-Tetris and in standard
Tetris with a smaller, 1010, board, by 31% and 84%, respectively, and
by improving the results for deep Sarsa() agents in three Atari 2600
games by 62% or more. The experiments also show the ability of the OMPAC method
to adapt the meta-parameters according to the learning progress in different
tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with
arXiv:1702.0311
Deep Reinforcement Learning and its application to games
The following project aims is to review the main concepts of Reinforcement Learning and
combine them with the tools of Deep Learning, studying in depth the application of these
methodologies, the Deep Reinforcement Learning algorithms, that are having such an
impact today being applied to numerous fields such as autonomous driving, robot control, gaming
and many more. In order to do this, first, in chapter 1, we will give a general overview of Deep
Reinforcement Learning as a introduction, as well as which is motivation to study this topic.
Then, in chapter 2, since it will be fundamental to achieve our goal, we give a brief review of
Deep Learning. We get into details with chapter 3, where we define Reinforcement Learning
mathematically, formalizing the concepts in order to build the classic solution algorithms in
chapter 4. As an application of these techniques, the implementation of the algorithms for the
game of Blackjack is presented in chapter 5. Finally, in chapter 6, we reach our initial objective by
building the algorithm that hides behind the Deep Q-Networks and we apply it to the Gridworld
games in chapter 7. A conclusions and improvements section for the project culminates the text.El siguiente proyecto tiene como objetivo revisar los principales conceptos del Aprendizaje
con Refuerzo y combinarlo con las herramientas del Aprendizaje Profundo, estudiando con
detalle la aplicación de estas metodologías, Aprendizaje con Refuerzo Profundo, que están
teniendo tanto impacto en la actualidad siendo aplicados a numerosos campos como la conducción
autónoma, el control de robots, juegos y muchos más. Para ello, en primer lugar, en el capítulo 1,
situaremos al Aprendizaje con Refuerzo Profundo a modo de introducción, motivando el estudio
de este campo. Acto seguido, en el capítulo 2, ya que será fundamental para lograr nuestro
objetivo, se realiza una breve revisión del Aprendizaje Profundo. Entraremos en detalles con
el capítulo 3, donde definiremos matemáticamente que se entiende Aprendizaje con Refuerzo,
formalizando los conceptos con el fin de construir los algoritmos de solución clásicos en el capítulo
4. Como aplicación de estas técnicas, en el capítulo 5 se presenta la implementación de los
algoritmos para el juego del Blackjack. Finalmente, en el capítulo 5, alcanzaremos nuestro
objetivo inicial construyendo el algoritmo detrás de las Deep Q-Networks y lo aplicamos a los
juegos Gridworld en capítulo 7. Una sección de conclusiones y mejoras para el proyecto culmina
el texto.Universidad de Sevilla. Grado en Matemáticas y Estadístic
A REINFORCEMENT LEARNING APPROACH TO VEHICLE PATH OPTIMIZATION IN URBAN ENVIRONMENTS
Road traffic management in metropolitan cities and urban areas, in general, is an important component of Intelligent Transportation Systems (ITS). With the increasing number of world population and vehicles, a dramatic increase in road traffic is expected to put pressure on the transportation infrastructure. Therefore, there is a pressing need to devise new ways to optimize the traffic flow in order to accommodate the growing needs of transportation systems. This work proposes to use an Artificial Intelligent (AI) method based on reinforcement learning techniques for computing near-optimal vehicle itineraries applied to Vehicular Ad-hoc Networks (VANETs). These itineraries are optimized based on the vehicle’s travel distance, travel time, and traffic road congestion. The problem of traffic density is formulated as a Markov Decision Process (MDP). In particular, this work introduces a new reward function that takes into account the traffic congestion when learning about the vehicle’s best action (best turn) to take in different situations. To learn the effect of this approach, the work investigated different learning algorithms such as Q-Learning and SARSA in conjunction with two exploration strategies: (a) e-greedy and (b) Softmax. A comparative performance study of these methods is presented to determine the most effective solution that enables the vehicles to find a fast and reliable path. Simulation experiments illustrate the effectiveness of proposed methods in computing optimal itineraries allowing vehicles to avoid traffic congestion while maintaining reasonable travel times and distances
- …