2,054 research outputs found

    Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal

    Full text link
    Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable

    Reset-free Trial-and-Error Learning for Robot Damage Recovery

    Get PDF
    The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at https://youtu.be/IqtyHFrb3BU, code at https://github.com/resibots/chatzilygeroudis_2018_rt

    Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning

    Full text link
    Developing a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generate its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts. We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to find an optimal policy which is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmac

    Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

    Full text link
    A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio

    Energy efficient path planning: the effectiveness of Q-learning algorithm in saving energy

    Get PDF
    Includes bibliographical references.In this thesis the author investigated the use of a Q-learning based path planning algorithm to investigate how effective it is in saving energy. It is important to pursue any means to save energy in this day and age, due to the excessive exploitation of natural resources and in order to prevent drops in production in industrial environments where less downtime is necessary or other applications where a mobile robot running out of energy can be costly or even disastrous, such as search and rescue operations or dangerous environment navigation. The study was undertaken by implementing a Q-learning based path planning algorithm in several unstructured and unknown environments. A cell decomposition method was used to generate the search space representation of the environments, within which the algorithm operated. The results show that the Q-learning path planner paths on average consumed 3.04% less energy than the A* path planning algorithm, in a square 20% obstacle density environment. The Q-learning path planner consumed on average 5.79% more energy than the least energy paths for the same environment. In the case of rectangular environments, the Q-learning path planning algorithm uses 1.68% less energy, than the A* path algorithm and 3.26 % more energy than the least energy paths. The implication of this study is to highlight the need for the use of learning algorithm in attempting to solve problems whose existing solutions are not learning based, in order to obtain better solutions

    Task Assignment and Path Planning for Autonomous Mobile Robots in Stochastic Warehouse Systems

    Get PDF
    The material handling industry is in the middle of a transformation from manual operations to automation due to the rapid growth in e-commerce. Autonomous mobile robots (AMRs) are being widely implemented to replace manually operated forklifts in warehouse systems to fulfil large shipping demand, extend warehouse operating hours, and mitigate safety concerns. Two open questions in AMR management are task assignment and path planning. This dissertation addresses the task assignment and path planning (TAPP) problem for autonomous mobile robots (AMR) in a warehouse environment. The goals are to maximize system productivity by avoiding AMR traffic and reducing travel time. The first topic in this dissertation is the development of a discrete event simulation modeling framework that can be used to evaluate alternative traffic control rules, task assignment methods, and path planning algorithms. The second topic, Risk Interval Path Planning (RIPP), is an algorithm designed to avoid conflicts among AMRs considering uncertainties in robot motion. The third topic is a deep reinforcement learning (DRL) model that is developed to solve task assignment and path planning problems, simultaneously. Experimental results demonstrate the effectiveness of these methods in stochastic warehouse systems
    • …
    corecore