3 research outputs found
Bootstrapping a DQN Replay Memory with Synthetic Experiences
An important component of many Deep Reinforcement Learning algorithms is the
Experience Replay which serves as a storage mechanism or memory of made
experiences. These experiences are used for training and help the agent to
stably find the perfect trajectory through the problem space. The classic
Experience Replay however makes only use of the experiences it actually made,
but the stored samples bear great potential in form of knowledge about the
problem that can be extracted. We present an algorithm that creates synthetic
experiences in a nondeterministic discrete environment to assist the learner.
The Interpolated Experience Replay is evaluated on the FrozenLake environment
and we show that it can support the agent to learn faster and even better than
the classic version
XCS Classifier System with Experience Replay
XCS constitutes the most deeply investigated classifier system today. It
bears strong potentials and comes with inherent capabilities for mastering a
variety of different learning tasks. Besides outstanding successes in various
classification and regression tasks, XCS also proved very effective in certain
multi-step environments from the domain of reinforcement learning. Especially
in the latter domain, recent advances have been mainly driven by algorithms
which model their policies based on deep neural networks -- among which the
Deep-Q-Network (DQN) is a prominent representative. Experience Replay (ER)
constitutes one of the crucial factors for the DQN's successes, since it
facilitates stabilized training of the neural network-based Q-function
approximators. Surprisingly, XCS barely takes advantage of similar mechanisms
that leverage stored raw experiences encountered so far. To bridge this gap,
this paper investigates the benefits of extending XCS with ER. On the one hand,
we demonstrate that for single-step tasks ER bears massive potential for
improvements in terms of sample efficiency. On the shady side, however, we
reveal that the use of ER might further aggravate well-studied issues not yet
solved for XCS when applied to sequential decision problems demanding for
long-action-chains