18,383 research outputs found
Atari games and Intel processors
The asynchronous nature of the state-of-the-art reinforcement learning
algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes
them exceptionally suitable for CPU computations. However, given the fact that
deep reinforcement learning often deals with interpreting visual information, a
large part of the train and inference time is spent performing convolutions. In
this work we present our results on learning strategies in Atari games using a
Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0
machine learning framework. We also analyze effects of asynchronous
computations on the convergence of reinforcement learning algorithms
Action-Conditional Video Prediction using Deep Networks in Atari Games
Motivated by vision-based reinforcement learning (RL) problems, in particular
Atari games from the recent benchmark Aracade Learning Environment (ALE), we
consider spatio-temporal prediction problems where future (image-)frames are
dependent on control variables or actions as well as previous frames. While not
composed of natural scenes, frames in Atari games are high-dimensional in size,
can involve tens of objects with one or more objects being controlled by the
actions directly and many other objects being influenced indirectly, can
involve entry and departure of objects, and can involve deep partial
observability. We propose and evaluate two deep neural network architectures
that consist of encoding, action-conditional transformation, and decoding
layers based on convolutional neural networks and recurrent neural networks.
Experimental results show that the proposed architectures are able to generate
visually-realistic frames that are also useful for control over approximately
100-step action-conditional futures in some games. To the best of our
knowledge, this paper is the first to make and evaluate long-term predictions
on high-dimensional video conditioned by control inputs.Comment: Published at NIPS 2015 (Advances in Neural Information Processing
Systems 28
Combining Experience Replay with Exploration by Random Network Distillation
Our work is a simple extension of the paper "Exploration by Random Network
Distillation". More in detail, we show how to efficiently combine Intrinsic
Rewards with Experience Replay in order to achieve more efficient and robust
exploration (with respect to PPO/RND) and consequently better results in terms
of agent performances and sample efficiency. We are able to do it by using a
new technique named Prioritized Oversampled Experience Replay (POER), that has
been built upon the definition of what is the important experience useful to
replay. Finally, we evaluate our technique on the famous Atari game Montezuma's
Revenge and some other hard exploration Atari games.Comment: 8 pages, 6 figures, accepted as full-paper at IEEE Conference on
Games (CoG) 201
Learning Actions and Control of Focus of Attention with a Log-Polar-like Sensor
With the long-term goal of reducing the image processing time on an
autonomous mobile robot in mind we explore in this paper the use of log-polar
like image data with gaze control. The gaze control is not done on the
Cartesian image but on the log-polar like image data. For this we start out
from the classic deep reinforcement learning approach for Atari games. We
extend an A3C deep RL approach with an LSTM network, and we learn the policy
for playing three Atari games and a policy for gaze control. While the Atari
games already use low-resolution images of 80 by 80 pixels, we are able to
further reduce the amount of image pixels by a factor of 5 without losing any
gaming performance
Model-Based Reinforcement Learning for Atari
Model-free reinforcement learning (RL) can be used to learn effective
policies for complex tasks, such as Atari games, even from image observations.
However, this typically requires very large amounts of interaction --
substantially more, in fact, than a human would need to learn the same games.
How can people learn so quickly? Part of the answer may be that people can
learn how the game works and predict which actions will lead to desirable
outcomes. In this paper, we explore how video prediction models can similarly
enable agents to solve Atari games with fewer interactions than model-free
methods. We describe Simulated Policy Learning (SimPLe), a complete model-based
deep RL algorithm based on video prediction models and present a comparison of
several model architectures, including a novel architecture that yields the
best results in our setting. Our experiments evaluate SimPLe on a range of
Atari games in low data regime of 100k interactions between the agent and the
environment, which corresponds to two hours of real-time play. In most games
SimPLe outperforms state-of-the-art model-free algorithms, in some games by
over an order of magnitude
- …