95 research outputs found
Learn to Interpret Atari Agents
Deep Reinforcement Learning (DeepRL) agents surpass human-level performances
in a multitude of tasks. However, the direct mapping from states to actions
makes it hard to interpret the rationale behind the decision making of agents.
In contrast to previous a-posteriori methods of visualizing DeepRL policies, we
propose an end-to-end trainable framework based on Rainbow, a representative
Deep Q-Network (DQN) agent. Our method automatically learns important regions
in the input domain, which enables characterizations of the decision making and
interpretations for non-intuitive behaviors. Hence we name it Region Sensitive
Rainbow (RS-Rainbow). RS-Rainbow utilizes a simple yet effective mechanism to
incorporate visualization ability into the learning model, not only improving
model interpretability, but leading to improved performance. Extensive
experiments on the challenging platform of Atari 2600 demonstrate the
superiority of RS-Rainbow. In particular, our agent achieves state of the art
at just 25% of the training frames. Demonstrations and code are available at
https://github.com/yz93/Learn-to-Interpret-Atari-Agents
Reuse of Neural Modules for General Video Game Playing
A general approach to knowledge transfer is introduced in which an agent
controlled by a neural network adapts how it reuses existing networks as it
learns in a new domain. Networks trained for a new domain can improve their
performance by routing activation selectively through previously learned neural
structure, regardless of how or for what it was learned. A neuroevolution
implementation of this approach is presented with application to
high-dimensional sequential decision-making domains. This approach is more
general than previous approaches to neural transfer for reinforcement learning.
It is domain-agnostic and requires no prior assumptions about the nature of
task relatedness or mappings. The method is analyzed in a stochastic version of
the Arcade Learning Environment, demonstrating that it improves performance in
some of the more complex Atari 2600 games, and that the success of transfer can
be predicted based on a high-level characterization of game dynamics.Comment: Accepted at AAAI 1
Experience Replay in Sparse Rewards Problems using Deep Reinforcement Techniques
In questo lavoro si introduce il lettore al Reinforcement Learning, un'area del Machine Learning su cui negli ultimi anni è stata fatta molta ricerca. In seguito vengono presentate alcune modifiche ad ACER, un algoritmo noto e molto interessante che fa uso di Experience Replay. Lo scopo è quello di cercare di aumentarne le performance su problemi generali ma in particolar modo sugli sparse reward problem. Per verificare la bontà delle idee proposte è utilizzato Montezuma's Revenge, un gioco sviluppato per Atari 2600 e considerato tra i più difficili da trattare
Implementing an Adaptive Genetic Algorithm in the Atari Environment
This thesis attempts to implement a genetic algorithm for training agents within the Atari game environments. The training is performed on hardware of a widely available character, and so the results give an indication of how well these models perform on relatively inexpensive equipment available to many people. The Atari environment Space Invaders was chosen to train and test the models in. As a baseline, a Deep Q-Network (DQN) algorithm is implemented within TensorFlow's TF-Agents framework. The DQN is a popular model that has inspired many new algorithms and is often used as a comparison to alternative approaches. An adaptive genetic algorithm called ACROMUSE was implemented and compared with the performance of the DQN within the environment. This algorithm adaptively determines crossover rates, mutation rates and tournament selection size. Using measures for diversity and fitness, two subpopulations are maintained to avoid converging toward a local optimum. Based on the results found here, the algorithm did not seem to converge or produce high-performing agents, and importantly performed worse than the DQN approach. The reasons for why this algorithm fails and why other genetic algorithms have succeeded are discussed. The large number of weight parameters present in the network seem to be a barrier to good performance. It is suggested that a parallel training approach is necessary to reach the number of agents and generations where a good solution could be found. It is also shown how the number of frames skipped in the environment had a significant impact on the performance of the baseline DQN model.2021-09-25T16:27:45
- …