Search CORE

1,134 research outputs found

Combining Experience Replay with Exploration by Random Network Distillation

Author: Sovrano Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Our work is a simple extension of the paper "Exploration by Random Network Distillation". More in detail, we show how to efficiently combine Intrinsic Rewards with Experience Replay in order to achieve more efficient and robust exploration (with respect to PPO/RND) and consequently better results in terms of agent performances and sample efficiency. We are able to do it by using a new technique named Prioritized Oversampled Experience Replay (POER), that has been built upon the definition of what is the important experience useful to replay. Finally, we evaluate our technique on the famous Atari game Montezuma's Revenge and some other hard exploration Atari games.Comment: 8 pages, 6 figures, accepted as full-paper at IEEE Conference on Games (CoG) 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Universal Trading for Order Execution with Oracle Policy Distillation

Author: Bian Jiang
Fang Yuchen
Liu Tie-Yan
Liu Weiqing
Ren Kan
Yu Yong
Zhang Weinan
Zhou Dong
Publication venue
Publication date: 28/01/2021
Field of study

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the shift from the analytical view with model-based market assumptions to model-free perspective, i.e., reinforcement learning, due to its nature of sequential decision optimization. However, the noisy and yet imperfect market information that can be leveraged by the policy has made it quite challenging to build up sample efficient reinforcement learning methods to achieve effective order execution. In this paper, we propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. Particularly, this framework leverages a policy distillation method that can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information to approximate the optimal trading strategy. The extensive experiments have shown significant improvements of our method over various strong baselines, with reasonable trading actions.Comment: Accepted in AAAI 2021, the code and the supplementary materials are in https://seqml.github.io/opd

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

DISCORL: Continual reinforcement learning via policy distillation: A preprint

Author: Cai Guanghang
Caselles-Dupré Hugo
Díaz-Rodríguez Natalia
Filliat David
Lesort Timothée
Sun Te
Traoré René
Publication venue: HAL CCSD
Publication date: 14/12/2019
Field of study

International audienceIn multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run

INRIA a CCSD electronic archive server