300 research outputs found
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
Using deep neural nets as function approximator for reinforcement learning
tasks have recently been shown to be very powerful for solving problems
approaching real-world complexity. Using these results as a benchmark, we
discuss the role that the discount factor may play in the quality of the
learning process of a deep Q-network (DQN). When the discount factor
progressively increases up to its final value, we empirically show that it is
possible to significantly reduce the number of learning steps. When used in
conjunction with a varying learning rate, we empirically show that it
outperforms original DQN on several experiments. We relate this phenomenon with
the instabilities of neural networks when they are used in an approximate
Dynamic Programming setting. We also describe the possibility to fall within a
local optimum during the learning process, thus connecting our discussion with
the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho
Performing Deep Recurrent Double Q-Learning for Atari Games
International audienceCurrently, many applications in Machine Learning are based on define new models to extract more information about data, In this case Deep Reinforcement Learning with the most common application in video games like Atari, Mario, and others causes an impact in how to computers can learning by himself with only information called rewards obtained from any action. There is a lot of algorithms modeled and implemented based on Deep Recurrent Q-Learning proposed by DeepMind used in AlphaZero and Go. In this document, We proposed Deep Recurrent Double Q-Learning which is an implementation of Deep Reinforcement Learning using Double Q-Learning algorithms and Recurrent Networks like LSTM and DRQN
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
Parallelized Interactive Machine Learning on Autonomous Vehicles
Deep reinforcement learning (deep RL) has achieved superior performance in
complex sequential tasks by learning directly from image input. A deep neural
network is used as a function approximator and requires no specific state
information. However, one drawback of using only images as input is that this
approach requires a prohibitively large amount of training time and data for
the model to learn the state feature representation and approach reasonable
performance. This is not feasible in real-world applications, especially when
the data are expansive and training phase could introduce disasters that affect
human safety. In this work, we use a human demonstration approach to speed up
training for learning features and use the resulting pre-trained model to
replace the neural network in the deep RL Deep Q-Network (DQN), followed by
human interaction to further refine the model. We empirically evaluate our
approach by using only a human demonstration model and modified DQN with human
demonstration model included in the Microsoft AirSim car simulator. Our results
show that (1) pre-training with human demonstration in a supervised learning
approach is better and much faster at discovering features than DQN alone, (2)
initializing the DQN with a pre-trained model provides a significant
improvement in training time and performance even with limited human
demonstration, and (3) providing the ability for humans to supply suggestions
during DQN training can speed up the network's convergence on an optimal
policy, as well as allow it to learn more complex policies that are harder to
discover by random exploration.Comment: 6 pages, NAECON 2018 - IEEE National Aerospace and Electronics
Conferenc
- …