300 research outputs found

    How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

    Full text link
    Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho

    Performing Deep Recurrent Double Q-Learning for Atari Games

    Get PDF
    International audienceCurrently, many applications in Machine Learning are based on define new models to extract more information about data, In this case Deep Reinforcement Learning with the most common application in video games like Atari, Mario, and others causes an impact in how to computers can learning by himself with only information called rewards obtained from any action. There is a lot of algorithms modeled and implemented based on Deep Recurrent Q-Learning proposed by DeepMind used in AlphaZero and Go. In this document, We proposed Deep Recurrent Double Q-Learning which is an implementation of Deep Reinforcement Learning using Double Q-Learning algorithms and Recurrent Networks like LSTM and DRQN

    Macro action selection with deep reinforcement learning in StarCraft

    Full text link
    StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also widely accepted as a challenging testbed for AI research because of its enormous state space, partially observed information, multi-agent collaboration, and so on. With the help of annual AIIDE and CIG competitions, a growing number of SC bots are proposed and continuously improved. However, a large gap remains between the top-level bot and the professional human player. One vital reason is that current SC bots mainly rely on predefined rules to select macro actions during their games. These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game. In this paper, we propose a deep reinforcement learning (DRL) framework to improve the selection of macro actions. Our framework is based on the combination of the Ape-X DQN and the Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as LastOrder. Our evaluation, based on training against all bots from the AIIDE 2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning rate, outperforming 26 bots in total 28 entrants

    Parallelized Interactive Machine Learning on Autonomous Vehicles

    Full text link
    Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by learning directly from image input. A deep neural network is used as a function approximator and requires no specific state information. However, one drawback of using only images as input is that this approach requires a prohibitively large amount of training time and data for the model to learn the state feature representation and approach reasonable performance. This is not feasible in real-world applications, especially when the data are expansive and training phase could introduce disasters that affect human safety. In this work, we use a human demonstration approach to speed up training for learning features and use the resulting pre-trained model to replace the neural network in the deep RL Deep Q-Network (DQN), followed by human interaction to further refine the model. We empirically evaluate our approach by using only a human demonstration model and modified DQN with human demonstration model included in the Microsoft AirSim car simulator. Our results show that (1) pre-training with human demonstration in a supervised learning approach is better and much faster at discovering features than DQN alone, (2) initializing the DQN with a pre-trained model provides a significant improvement in training time and performance even with limited human demonstration, and (3) providing the ability for humans to supply suggestions during DQN training can speed up the network's convergence on an optimal policy, as well as allow it to learn more complex policies that are harder to discover by random exploration.Comment: 6 pages, NAECON 2018 - IEEE National Aerospace and Electronics Conferenc
    • …
    corecore