7,088 research outputs found

    Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

    Full text link
    This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRL

    Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks

    Full text link
    Autonomous robots need to interact with unknown, unstructured and changing environments, constantly facing novel challenges. Therefore, continuous online adaptation for lifelong-learning and the need of sample-efficient mechanisms to adapt to changes in the environment, the constraints, the tasks, or the robot itself are crucial. In this work, we propose a novel framework for probabilistic online motion planning with online adaptation based on a bio-inspired stochastic recurrent neural network. By using learning signals which mimic the intrinsic motivation signalcognitive dissonance in addition with a mental replay strategy to intensify experiences, the stochastic recurrent network can learn from few physical interactions and adapts to novel environments in seconds. We evaluate our online planning and adaptation framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is shown by learning unknown workspace constraints sample-efficiently from few physical interactions while following given way points.Comment: accepted in Neural Network

    Lifelong Multi-Agent Path Finding in Large-Scale Warehouses

    Full text link
    Multi-Agent Path Finding (MAPF) is the problem of moving a team of agents to their goal locations without collisions. In this paper, we study the lifelong variant of MAPF, where agents are constantly engaged with new goal locations, such as in large-scale automated warehouses. We propose a new framework Rolling-Horizon Collision Resolution (RHCR) for solving lifelong MAPF by decomposing the problem into a sequence of Windowed MAPF instances, where a Windowed MAPF solver resolves collisions among the paths of the agents only within a bounded time horizon and ignores collisions beyond it. RHCR is particularly well suited to generating pliable plans that adapt to continually arriving new goal locations. We empirically evaluate RHCR with a variety of MAPF solvers and show that it can produce high-quality solutions for up to 1,000 agents (= 38.9\% of the empty cells on the map) for simulated warehouse instances, significantly outperforming existing work.Comment: Published at AAAI 202

    Experimental analysis of sample-based maps for long-term SLAM

    Get PDF
    This paper presents a system for long-term SLAM (simultaneous localization and mapping) by mobile service robots and its experimental evaluation in a real dynamic environment. To deal with the stability-plasticity dilemma (the trade-off between adaptation to new patterns and preservation of old patterns), the environment is represented at multiple timescales simultaneously (5 in our experiments). A sample-based representation is proposed, where older memories fade at different rates depending on the timescale, and robust statistics are used to interpret the samples. The dynamics of this representation are analysed in a five week experiment, measuring the relative influence of short- and long-term memories over time, and further demonstrating the robustness of the approach
    corecore