Search CORE

109,048 research outputs found

Self-Supervised Reinforcement Learning that Transfers using Random Features

Author: Agrawal Pulkit
Chen Boyuan
Gupta Abhishek
Zhang Kaiqing
Zhu Chuning
Publication venue
Publication date: 26/05/2023
Field of study

Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are known to be hard to generalize across tasks. Model-based RL, on the other hand, learns task-agnostic models of the world that naturally enables transfer across different reward functions, but struggles to scale to complex environments due to the compounding error. To get the best of both worlds, we propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards, while circumventing the challenges of model-based RL. In particular, we show self-supervised pre-training of model-free reinforcement learning with a number of random features as rewards allows implicit modeling of long-horizon environment dynamics. Then, planning techniques like model-predictive control using these implicit models enable fast adaptation to problems with new reward functions. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks. We validate that our proposed method enables transfer across tasks on a variety of manipulation and locomotion domains in simulation, opening the door to generalist decision-making agents

arXiv.org e-Print Archive

Online learning in financial time series

Author: Borrageiro Gabriel Franceisco
Publication venue: UCL (University College London)
Publication date: 28/01/2023
Field of study

We wish to understand if additional learning forms can be combined with sequential optimisation to provide superior benefit over batch learning in various tasks operating in financial time series. In chapter 4, Online learning with radial basis function networks, we provide multi-horizon forecasts on the returns of financial time series. Our sequentially optimised radial basis function network (RBFNet) outperforms a random-walk baseline and several powerful supervised learners. Our RBFNets naturally measure the similarity between test samples and prototypes that capture the characteristics of the feature space. In chapter 5, Reinforcement learning for systematic FX trading, we perform feature representation transfer from an RBFNet to a direct, recurrent reinforcement learning (DRL) agent. Earlier academic work saw mixed results. We use better features, second-order optimisation methods and adapt our model parameters sequentially. As a result, our DRL agents cope better with statistical changes to the data distribution, achieving higher risk-adjusted returns than a funding and a momentum baseline. In chapter 6, The recurrent reinforcement learning crypto agent, we construct a digital assets trading agent that performs feature space representation transfer from an echo state network to a DRL agent. The agent learns to trade the XBTUSD perpetual swap contract on BitMEX. Our meta-model can process data as a stream and learn sequentially; this helps it cope with the nonstationary environment. In chapter 7, Sequential asset ranking in nonstationary time series, we create an online learning long/short portfolio selection algorithm that can detect the best and worst performing portfolio constituents that change over time; in particular, we successfully handle the higher transaction costs associated with using daily-sampled data, and achieve higher total and risk-adjusted returns than the long-only holding of the S&P 500 index with hindsight

UCL Discovery

Recommended from our members

Continual State Representation Learning for Reinforcement Learning using Generative Replay

Author: Caselles-Dupré H.
Filliat D.
Garcia Ortiz M.
Publication venue
Publication date
Field of study

We consider the problem of building a state representation model in a continual fashion. As the environment changes, the aim is to efficiently compress the sensory state's information without losing past knowledge. The learned features are then fed to a Reinforcement Learning algorithm to learn a policy. We propose to use Variational Auto-Encoders for state representation, and Generative Replay, i.e. the use of generated samples, to maintain past knowledge. We also provide a general and statistically sound method for automatic environment change detection. Our method provides efficient state representation as well as forward transfer, and avoids catastrophic forgetting. The resulting model is capable of incrementally learning information without using past data and with a bounded system size

City Research Online

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Author: Liu Boyi
Liu Ming
Wang Lujia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/05/2019
Field of study

This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRL

arXiv.org e-Print Archive

Crossref

Grounding Language for Transfer in Deep Reinforcement Learning

Author: Barzilay Regina
Jaakkola Tommi
Narasimhan Karthik
Publication venue
Publication date: 01/12/2018
Field of study

In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.Comment: JAIR 201

arXiv.org e-Print Archive

Princeton University Open Access Repository

DSpace@MIT