13,865 research outputs found
Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios
In this paper, we present a decentralized sensor-level collision avoidance
policy for multi-robot systems, which shows promising results in practical
applications. In particular, our policy directly maps raw sensor measurements
to an agent's steering commands in terms of the movement velocity. As a first
step toward reducing the performance gap between decentralized and centralized
methods, we present a multi-scenario multi-stage training framework to learn an
optimal policy. The policy is trained over a large number of robots in rich,
complex environments simultaneously using a policy gradient based reinforcement
learning algorithm. The learning algorithm is also integrated into a hybrid
control framework to further improve the policy's robustness and effectiveness.
We validate the learned sensor-level collision avoidance policy in a variety
of simulated and real-world scenarios with thorough performance evaluations for
large-scale multi-robot systems. The generalization of the learned policy is
verified in a set of unseen scenarios including the navigation of a group of
heterogeneous robots and a large-scale scenario with 100 robots. Although the
policy is trained using simulation data only, we have successfully deployed it
on physical robots with shapes and dynamics characteristics that are different
from the simulated agents, in order to demonstrate the controller's robustness
against the sim-to-real modeling error. Finally, we show that the
collision-avoidance policy learned from multi-robot navigation tasks provides
an excellent solution to the safe and effective autonomous navigation for a
single robot working in a dense real human crowd. Our learned policy enables a
robot to make effective progress in a crowd without getting stuck. Videos are
available at https://sites.google.com/view/hybridmrc
Verification for Machine Learning, Autonomy, and Neural Networks Survey
This survey presents an overview of verification techniques for autonomous
systems, with a focus on safety-critical autonomous cyber-physical systems
(CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances
in artificial intelligence (AI) and machine learning (ML) through approaches
such as deep neural networks (DNNs), embedded in so-called learning enabled
components (LECs) that accomplish tasks from classification to control.
Recently, the formal methods and formal verification community has developed
methods to characterize behaviors in these LECs with eventual goals of formally
verifying specifications for LECs, and this article presents a survey of many
of these recent approaches
Reinforcement Learning Meets Hybrid Zero Dynamics: A Case Study for RABBIT
The design of feedback controllers for bipedal robots is challenging due to
the hybrid nature of its dynamics and the complexity imposed by
high-dimensional bipedal models. In this paper, we present a novel approach for
the design of feedback controllers using Reinforcement Learning (RL) and Hybrid
Zero Dynamics (HZD). Existing RL approaches for bipedal walking are inefficient
as they do not consider the underlying physics, often requires substantial
training, and the resulting controller may not be applicable to real robots.
HZD is a powerful tool for bipedal control with local stability guarantees of
the walking limit cycles. In this paper, we propose a non traditional RL
structure that embeds the HZD framework into the policy learning. More
specifically, we propose to use RL to find a control policy that maps from the
robot's reduced order states to a set of parameters that define the desired
trajectories for the robot's joints through the virtual constraints. Then,
these trajectories are tracked using an adaptive PD controller. The method
results in a stable and robust control policy that is able to track variable
speed within a continuous interval. Robustness of the policy is evaluated by
applying external forces to the torso of the robot. The proposed RL framework
is implemented and demonstrated in OpenAI Gym with the MuJoCo physics engine
based on the well-known RABBIT robot model.Comment: Supplemental video: https://www.youtube.com/watch?v=dhHMfnl7Yl
ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems
Model-free policy learning has enabled robust performance of complex tasks
with relatively simple algorithms. However, this simplicity comes at the cost
of requiring an Oracle and arguably very poor sample complexity. This renders
such methods unsuitable for physical systems. Variants of model-based methods
address this problem through the use of simulators, however, this gives rise to
the problem of policy transfer from simulated to the physical system. Model
mismatch due to systematic parameter shift and unmodelled dynamics error may
cause sub-optimal or unsafe behavior upon direct transfer. We introduce the
Adaptive Policy Transfer for Stochastic Dynamics (ADAPT) algorithm that
achieves provably safe and robust, dynamically-feasible zero-shot transfer of
RL-policies to new domains with dynamics error. ADAPT combines the strengths of
offline policy learning in a black-box source simulator with online tube-based
MPC to attenuate bounded model mismatch between the source and target dynamics.
ADAPT allows online transfer of policy, trained solely in a simulation offline,
to a family of unknown targets without fine-tuning. We also formally show that
(i) ADAPT guarantees state and control safety through state-action tubes under
the assumption of Lipschitz continuity of the divergence in dynamics and, (ii)
ADAPT results in a bounded loss of reward accumulation relative to a policy
trained and evaluated in the source environment. We evaluate ADAPT on 2
continuous, non-holonomic simulated dynamical systems with 4 different
disturbance models, and find that ADAPT performs between 50%-300% better on
mean reward accrual than direct policy transfer.Comment: International Symposium on Robotics Research (ISRR), 201
Learning to Herd Agents Amongst Obstacles: Training Robust Shepherding Behaviors using Deep Reinforcement Learning
Robotic shepherding problem considers the control and navigation of a group
of coherent agents (e.g., a flock of bird or a fleet of drones) through the
motion of an external robot, called shepherd. Machine learning based methods
have successfully solved this problem in an empty environment with no
obstacles. Rule-based methods, on the other hand, can handle more complex
scenarios in which environments are cluttered with obstacles and allow multiple
shepherds to work collaboratively. However, these rule-based methods are
fragile due to the difficulty in defining a comprehensive set of rules that can
handle all possible cases. To overcome these limitations, we propose the first
known learning-based method that can herd agents amongst obstacles. By using
deep reinforcement learning techniques combined with the probabilistic
roadmaps, we train a shepherding model using noisy but controlled environmental
and behavioral parameters. Our experimental results show that the proposed
method is robust, namely, it is insensitive to the uncertainties originated
from both environmental and behavioral models. Consequently, the proposed
method has a higher success rate, shorter completion time and path length than
the rule-based behavioral methods have. These advantages are particularly
prominent in more challenging scenarios involving more difficult groups and
strenuous passages
Semi-parametric Topological Memory for Navigation
We introduce a new memory architecture for navigation in previously unseen
environments, inspired by landmark-based navigation in animals. The proposed
semi-parametric topological memory (SPTM) consists of a (non-parametric) graph
with nodes corresponding to locations in the environment and a (parametric)
deep network capable of retrieving nodes from the graph based on observations.
The graph stores no metric information, only connectivity of locations
corresponding to the nodes. We use SPTM as a planning module in a navigation
system. Given only 5 minutes of footage of a previously unseen maze, an
SPTM-based navigation agent can build a topological map of the environment and
use it to confidently navigate towards goals. The average success rate of the
SPTM agent in goal-directed navigation across test environments is higher than
the best-performing baseline by a factor of three. A video of the agent is
available at https://youtu.be/vRF7f4lhswoComment: Published at International Conference on Learning Representations
(ICLR) 2018. Project website at https://sites.google.com/view/SPT
Asynchronous Methods for Deep Reinforcement Learning
We propose a conceptually simple and lightweight framework for deep
reinforcement learning that uses asynchronous gradient descent for optimization
of deep neural network controllers. We present asynchronous variants of four
standard reinforcement learning algorithms and show that parallel
actor-learners have a stabilizing effect on training allowing all four methods
to successfully train neural network controllers. The best performing method,
an asynchronous variant of actor-critic, surpasses the current state-of-the-art
on the Atari domain while training for half the time on a single multi-core CPU
instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds
on a wide variety of continuous motor control problems as well as on a new task
of navigating random 3D mazes using a visual input
Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks
Deep reinforcement learning yields great results for a large array of
problems, but models are generally retrained anew for each new problem to be
solved. Prior learning and knowledge are difficult to incorporate when training
new models, requiring increasingly longer training as problems become more
complex. This is especially problematic for problems with sparse rewards. We
provide a solution to these problems by introducing Concept Network
Reinforcement Learning (CNRL), a framework which allows us to decompose
problems using a multi-level hierarchy. Concepts in a concept network are
reusable, and flexible enough to encapsulate feature extractors, skills, or
other concept networks. With this hierarchical learning approach, deep
reinforcement learning can be used to solve complex tasks in a modular way,
through problem decomposition. We demonstrate the strength of CNRL by training
a model to grasp a rectangular prism and precisely stack it on top of a cube
using a gripper on a Kinova JACO arm, simulated in MuJoCo. Our experiments show
that our use of hierarchy results in a 45x reduction in environment
interactions compared to the state-of-the-art on this task
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
We describe a learning-based approach to hand-eye coordination for robotic
grasping from monocular images. To learn hand-eye coordination for grasping, we
trained a large convolutional neural network to predict the probability that
task-space motion of the gripper will result in successful grasps, using only
monocular camera images and independently of camera calibration or the current
robot pose. This requires the network to observe the spatial relationship
between the gripper and objects in the scene, thus learning hand-eye
coordination. We then use this network to servo the gripper in real time to
achieve successful grasps. To train our network, we collected over 800,000
grasp attempts over the course of two months, using between 6 and 14 robotic
manipulators at any given time, with differences in camera placement and
hardware. Our experimental evaluation demonstrates that our method achieves
effective real-time control, can successfully grasp novel objects, and corrects
mistakes by continuous servoing.Comment: This is an extended version of "Learning Hand-Eye Coordination for
Robotic Grasping with Large-Scale Data Collection," ISER 2016. Draft modified
to correct typo in Algorithm 1 and add a link to the publicly available
datase
Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement Learning
We apply Deep Q-network (DQN) with the consideration of safety during the
task for deciding whether to conduct the maneuver. Furthermore, we design two
similar Deep Q learning frameworks with quadratic approximator for deciding how
to select a comfortable gap and just follow the preceding vehicle. Finally, a
polynomial lane change trajectory is generated and Pure Pursuit Control is
implemented for path tracking. We demonstrate the effectiveness of this
framework in simulation, from both the decision-making and control layers. The
proposed architecture also has the potential to be extended to other autonomous
driving scenarios.Comment: This Paper has been submitted to ITSC 201
- …