1,134 research outputs found
Combining Experience Replay with Exploration by Random Network Distillation
Our work is a simple extension of the paper "Exploration by Random Network
Distillation". More in detail, we show how to efficiently combine Intrinsic
Rewards with Experience Replay in order to achieve more efficient and robust
exploration (with respect to PPO/RND) and consequently better results in terms
of agent performances and sample efficiency. We are able to do it by using a
new technique named Prioritized Oversampled Experience Replay (POER), that has
been built upon the definition of what is the important experience useful to
replay. Finally, we evaluate our technique on the famous Atari game Montezuma's
Revenge and some other hard exploration Atari games.Comment: 8 pages, 6 figures, accepted as full-paper at IEEE Conference on
Games (CoG) 201
Universal Trading for Order Execution with Oracle Policy Distillation
As a fundamental problem in algorithmic trading, order execution aims at
fulfilling a specific trading order, either liquidation or acquirement, for a
given instrument. Towards effective execution strategy, recent years have
witnessed the shift from the analytical view with model-based market
assumptions to model-free perspective, i.e., reinforcement learning, due to its
nature of sequential decision optimization. However, the noisy and yet
imperfect market information that can be leveraged by the policy has made it
quite challenging to build up sample efficient reinforcement learning methods
to achieve effective order execution. In this paper, we propose a novel
universal trading policy optimization framework to bridge the gap between the
noisy yet imperfect market states and the optimal action sequences for order
execution. Particularly, this framework leverages a policy distillation method
that can better guide the learning of the common policy towards practically
optimal execution by an oracle teacher with perfect information to approximate
the optimal trading strategy. The extensive experiments have shown significant
improvements of our method over various strong baselines, with reasonable
trading actions.Comment: Accepted in AAAI 2021, the code and the supplementary materials are
in https://seqml.github.io/opd
DISCORL: Continual reinforcement learning via policy distillation: A preprint
International audienceIn multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run
- …