2,054 research outputs found
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Model-free reinforcement learning has recently been shown to be effective at
learning navigation policies from complex image input. However, these
algorithms tend to require large amounts of interaction with the environment,
which can be prohibitively costly to obtain on robots in the real world. We
present an approach for efficiently learning goal-directed navigation policies
on a mobile robot, from only a single coverage traversal of recorded data. The
navigation agent learns an effective policy over a diverse action space in a
large heterogeneous environment consisting of more than 2km of travel, through
buildings and outdoor regions that collectively exhibit large variations in
visual appearance, self-similarity, and connectivity. We compare pretrained
visual encoders that enable precomputation of visual embeddings to achieve a
throughput of tens of thousands of transitions per second at training time on a
commodity desktop computer, allowing agents to learn from millions of
trajectories of experience in a matter of hours. We propose multiple forms of
computationally efficient stochastic augmentation to enable the learned policy
to generalise beyond these precomputed embeddings, and demonstrate successful
deployment of the learned policy on the real robot without fine tuning, despite
environmental appearance differences at test time. The dataset and code
required to reproduce these results and apply the technique to other datasets
and robots is made publicly available at rl-navigation.github.io/deployable
Reset-free Trial-and-Error Learning for Robot Damage Recovery
The high probability of hardware failures prevents many advanced robots
(e.g., legged robots) from being confidently deployed in real-world situations
(e.g., post-disaster rescue). Instead of attempting to diagnose the failures,
robots could adapt by trial-and-error in order to be able to complete their
tasks. In this situation, damage recovery can be seen as a Reinforcement
Learning (RL) problem. However, the best RL algorithms for robotics require the
robot and the environment to be reset to an initial state after each episode,
that is, the robot is not learning autonomously. In addition, most of the RL
methods for robotics do not scale well with complex robots (e.g., walking
robots) and either cannot be used at all or take too long to converge to a
solution (e.g., hours of learning). In this paper, we introduce a novel
learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks
the complexity by pre-generating hundreds of possible behaviors with a dynamics
simulator of the intact robot, and (2) allows complex robots to quickly recover
from damage while completing their tasks and taking the environment into
account. We evaluate our algorithm on a simulated wheeled robot, a simulated
six-legged robot, and a real six-legged walking robot that are damaged in
several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and
whose objective is to reach a sequence of targets in an arena. Our experiments
show that the robots can recover most of their locomotion abilities in an
environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at
https://youtu.be/IqtyHFrb3BU, code at
https://github.com/resibots/chatzilygeroudis_2018_rt
Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning
Developing a safe and efficient collision avoidance policy for multiple
robots is challenging in the decentralized scenarios where each robot generate
its paths without observing other robots' states and intents. While other
distributed multi-robot collision avoidance systems exist, they often require
extracting agent-level features to plan a local collision-free action, which
can be computationally prohibitive and not robust. More importantly, in
practice the performance of these methods are much lower than their centralized
counterparts.
We present a decentralized sensor-level collision avoidance policy for
multi-robot systems, which directly maps raw sensor measurements to an agent's
steering commands in terms of movement velocity. As a first step toward
reducing the performance gap between decentralized and centralized methods, we
present a multi-scenario multi-stage training framework to find an optimal
policy which is trained over a large number of robots on rich, complex
environments simultaneously using a policy gradient based reinforcement
learning algorithm. We validate the learned sensor-level collision avoidance
policy in a variety of simulated scenarios with thorough performance
evaluations and show that the final learned policy is able to find time
efficient, collision-free paths for a large-scale robot system. We also
demonstrate that the learned policy can be well generalized to new scenarios
that do not appear in the entire training period, including navigating a
heterogeneous group of robots and a large-scale scenario with 100 robots.
Videos are available at https://sites.google.com/view/drlmac
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dream
since before the Jetsons cartoon series imagined a life of leisure mediated by
a fleet of attentive robot helpers. It is a dream that remains stubbornly
distant. However, recent advances in vision and language methods have made
incredible progress in closely related areas. This is significant because a
robot interpreting a natural-language navigation instruction on the basis of
what it sees is carrying out a vision and language process that is similar to
Visual Question Answering. Both tasks can be interpreted as visually grounded
sequence-to-sequence translation problems, and many of the same methods are
applicable. To enable and encourage the application of vision and language
methods to the problem of interpreting visually-grounded navigation
instructions, we present the Matterport3D Simulator -- a large-scale
reinforcement learning environment based on real imagery. Using this simulator,
which can in future support a range of embodied vision and language tasks, we
provide the first benchmark dataset for visually-grounded natural language
navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
Energy efficient path planning: the effectiveness of Q-learning algorithm in saving energy
Includes bibliographical references.In this thesis the author investigated the use of a Q-learning based path planning algorithm to investigate how effective it is in saving energy. It is important to pursue any means to save energy in this day and age, due to the excessive exploitation of natural resources and in order to prevent drops in production in industrial environments where less downtime is necessary or other applications where a mobile robot running out of energy can be costly or even disastrous, such as search and rescue operations or dangerous environment navigation. The study was undertaken by implementing a Q-learning based path planning algorithm in several unstructured and unknown environments. A cell decomposition method was used to generate the search space representation of the environments, within which the algorithm operated. The results show that the Q-learning path planner paths on average consumed 3.04% less energy than the A* path planning algorithm, in a square 20% obstacle density environment. The Q-learning path planner consumed on average 5.79% more energy than the least energy paths for the same environment. In the case of rectangular environments, the Q-learning path planning algorithm uses 1.68% less energy, than the A* path algorithm and 3.26 % more energy than the least energy paths. The implication of this study is to highlight the need for the use of learning algorithm in attempting to solve problems whose existing solutions are not learning based, in order to obtain better solutions
Task Assignment and Path Planning for Autonomous Mobile Robots in Stochastic Warehouse Systems
The material handling industry is in the middle of a transformation from manual operations to automation due to the rapid growth in e-commerce. Autonomous mobile robots (AMRs) are being widely implemented to replace manually operated forklifts in warehouse systems to fulfil large shipping demand, extend warehouse operating hours, and mitigate safety concerns. Two open questions in AMR management are task assignment and path planning. This dissertation addresses the task assignment and path planning (TAPP) problem for autonomous mobile robots (AMR) in a warehouse environment. The goals are to maximize system productivity by avoiding AMR traffic and reducing travel time. The first topic in this dissertation is the development of a discrete event simulation modeling framework that can be used to evaluate alternative traffic control rules, task assignment methods, and path planning algorithms. The second topic, Risk Interval Path Planning (RIPP), is an algorithm designed to avoid conflicts among AMRs considering uncertainties in robot motion. The third topic is a deep reinforcement learning (DRL) model that is developed to solve task assignment and path planning problems, simultaneously. Experimental results demonstrate the effectiveness of these methods in stochastic warehouse systems
- …