95 research outputs found

    Learn to Interpret Atari Agents

    Full text link
    Deep Reinforcement Learning (DeepRL) agents surpass human-level performances in a multitude of tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision making of agents. In contrast to previous a-posteriori methods of visualizing DeepRL policies, we propose an end-to-end trainable framework based on Rainbow, a representative Deep Q-Network (DQN) agent. Our method automatically learns important regions in the input domain, which enables characterizations of the decision making and interpretations for non-intuitive behaviors. Hence we name it Region Sensitive Rainbow (RS-Rainbow). RS-Rainbow utilizes a simple yet effective mechanism to incorporate visualization ability into the learning model, not only improving model interpretability, but leading to improved performance. Extensive experiments on the challenging platform of Atari 2600 demonstrate the superiority of RS-Rainbow. In particular, our agent achieves state of the art at just 25% of the training frames. Demonstrations and code are available at https://github.com/yz93/Learn-to-Interpret-Atari-Agents

    Reuse of Neural Modules for General Video Game Playing

    Full text link
    A general approach to knowledge transfer is introduced in which an agent controlled by a neural network adapts how it reuses existing networks as it learns in a new domain. Networks trained for a new domain can improve their performance by routing activation selectively through previously learned neural structure, regardless of how or for what it was learned. A neuroevolution implementation of this approach is presented with application to high-dimensional sequential decision-making domains. This approach is more general than previous approaches to neural transfer for reinforcement learning. It is domain-agnostic and requires no prior assumptions about the nature of task relatedness or mappings. The method is analyzed in a stochastic version of the Arcade Learning Environment, demonstrating that it improves performance in some of the more complex Atari 2600 games, and that the success of transfer can be predicted based on a high-level characterization of game dynamics.Comment: Accepted at AAAI 1

    Experience Replay in Sparse Rewards Problems using Deep Reinforcement Techniques

    Get PDF
    In questo lavoro si introduce il lettore al Reinforcement Learning, un'area del Machine Learning su cui negli ultimi anni è stata fatta molta ricerca. In seguito vengono presentate alcune modifiche ad ACER, un algoritmo noto e molto interessante che fa uso di Experience Replay. Lo scopo è quello di cercare di aumentarne le performance su problemi generali ma in particolar modo sugli sparse reward problem. Per verificare la bontà delle idee proposte è utilizzato Montezuma's Revenge, un gioco sviluppato per Atari 2600 e considerato tra i più difficili da trattare

    Implementing an Adaptive Genetic Algorithm in the Atari Environment

    Get PDF
    This thesis attempts to implement a genetic algorithm for training agents within the Atari game environments. The training is performed on hardware of a widely available character, and so the results give an indication of how well these models perform on relatively inexpensive equipment available to many people. The Atari environment Space Invaders was chosen to train and test the models in. As a baseline, a Deep Q-Network (DQN) algorithm is implemented within TensorFlow's TF-Agents framework. The DQN is a popular model that has inspired many new algorithms and is often used as a comparison to alternative approaches. An adaptive genetic algorithm called ACROMUSE was implemented and compared with the performance of the DQN within the environment. This algorithm adaptively determines crossover rates, mutation rates and tournament selection size. Using measures for diversity and fitness, two subpopulations are maintained to avoid converging toward a local optimum. Based on the results found here, the algorithm did not seem to converge or produce high-performing agents, and importantly performed worse than the DQN approach. The reasons for why this algorithm fails and why other genetic algorithms have succeeded are discussed. The large number of weight parameters present in the network seem to be a barrier to good performance. It is suggested that a parallel training approach is necessary to reach the number of agents and generations where a good solution could be found. It is also shown how the number of frames skipped in the environment had a significant impact on the performance of the baseline DQN model.2021-09-25T16:27:45
    • …
    corecore