109,046 research outputs found

    Self-Supervised Reinforcement Learning that Transfers using Random Features

    Full text link
    Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are known to be hard to generalize across tasks. Model-based RL, on the other hand, learns task-agnostic models of the world that naturally enables transfer across different reward functions, but struggles to scale to complex environments due to the compounding error. To get the best of both worlds, we propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards, while circumventing the challenges of model-based RL. In particular, we show self-supervised pre-training of model-free reinforcement learning with a number of random features as rewards allows implicit modeling of long-horizon environment dynamics. Then, planning techniques like model-predictive control using these implicit models enable fast adaptation to problems with new reward functions. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks. We validate that our proposed method enables transfer across tasks on a variety of manipulation and locomotion domains in simulation, opening the door to generalist decision-making agents

    Online learning in financial time series

    Get PDF
    We wish to understand if additional learning forms can be combined with sequential optimisation to provide superior benefit over batch learning in various tasks operating in financial time series. In chapter 4, Online learning with radial basis function networks, we provide multi-horizon forecasts on the returns of financial time series. Our sequentially optimised radial basis function network (RBFNet) outperforms a random-walk baseline and several powerful supervised learners. Our RBFNets naturally measure the similarity between test samples and prototypes that capture the characteristics of the feature space. In chapter 5, Reinforcement learning for systematic FX trading, we perform feature representation transfer from an RBFNet to a direct, recurrent reinforcement learning (DRL) agent. Earlier academic work saw mixed results. We use better features, second-order optimisation methods and adapt our model parameters sequentially. As a result, our DRL agents cope better with statistical changes to the data distribution, achieving higher risk-adjusted returns than a funding and a momentum baseline. In chapter 6, The recurrent reinforcement learning crypto agent, we construct a digital assets trading agent that performs feature space representation transfer from an echo state network to a DRL agent. The agent learns to trade the XBTUSD perpetual swap contract on BitMEX. Our meta-model can process data as a stream and learn sequentially; this helps it cope with the nonstationary environment. In chapter 7, Sequential asset ranking in nonstationary time series, we create an online learning long/short portfolio selection algorithm that can detect the best and worst performing portfolio constituents that change over time; in particular, we successfully handle the higher transaction costs associated with using daily-sampled data, and achieve higher total and risk-adjusted returns than the long-only holding of the S&P 500 index with hindsight

    Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

    Full text link
    This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRL

    Grounding Language for Transfer in Deep Reinforcement Learning

    Full text link
    In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.Comment: JAIR 201
    corecore