1,135 research outputs found
Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning
In this paper, we propose a reinforcement learning-based algorithm for
trajectory optimization for constrained dynamical systems. This problem is
motivated by the fact that for most robotic systems, the dynamics may not
always be known. Generating smooth, dynamically feasible trajectories could be
difficult for such systems. Using sampling-based algorithms for motion planning
may result in trajectories that are prone to undesirable control jumps.
However, they can usually provide a good reference trajectory which a
model-free reinforcement learning algorithm can then exploit by limiting the
search domain and quickly finding a dynamically smooth trajectory. We use this
idea to train a reinforcement learning agent to learn a dynamically smooth
trajectory in a curriculum learning setting. Furthermore, for generalization,
we parameterize the policies with goal locations, so that the agent can be
trained for multiple goals simultaneously. We show result in both simulated
environments as well as real experiments, for a -DoF manipulator arm
operated in position-controlled mode to validate the proposed idea. We compare
the proposed ideas against a PID controller which is used to track a designed
trajectory in configuration space. Our experiments show that our RL agent
trained with a reference path outperformed a model-free PID controller of the
type commonly used on many robotic platforms for trajectory tracking.Comment: 8 pages, 6 figures, Accepted to IROS 201
A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning
For safe and efficient planning and control in autonomous driving, we need a
driving policy which can achieve desirable driving quality in long-term horizon
with guaranteed safety and feasibility. Optimization-based approaches, such as
Model Predictive Control (MPC), can provide such optimal policies, but their
computational complexity is generally unacceptable for real-time
implementation. To address this problem, we propose a fast integrated planning
and control framework that combines learning- and optimization-based approaches
in a two-layer hierarchical structure. The first layer, defined as the "policy
layer", is established by a neural network which learns the long-term optimal
driving policy generated by MPC. The second layer, called the "execution
layer", is a short-term optimization-based controller that tracks the reference
trajecotries given by the "policy layer" with guaranteed short-term safety and
feasibility. Moreover, with efficient and highly-representative features, a
small-size neural network is sufficient in the "policy layer" to handle many
complicated driving scenarios. This renders online imitation learning with
Dataset Aggregation (DAgger) so that the performance of the "policy layer" can
be improved rapidly and continuously online. Several exampled driving scenarios
are demonstrated to verify the effectiveness and efficiency of the proposed
framework
Realtime Collision Avoidance for Mobile Robots in Dense Crowds using Implicit Multi-sensor Fusion and Deep Reinforcement Learning
We present a novel learning-based collision avoidance algorithm, CrowdSteer,
for mobile robots operating in dense and crowded environments. Our approach is
end-to-end and uses multiple perception sensors such as a 2-D lidar along with
a depth camera to sense surrounding dynamic agents and compute collision-free
velocities. Our training approach is based on the sim-to-real paradigm and uses
high fidelity 3-D simulations of pedestrians and the environment to train a
policy using Proximal Policy Optimization (PPO). We show that our learned
navigation model is directly transferable to previously unseen virtual and
dense real-world environments. We have integrated our algorithm with
differential drive robots and evaluated its performance in narrow scenarios
such as dense crowds, narrow corridors, T-junctions, L-junctions, etc. In
practice, our approach can perform real-time collision avoidance and generate
smooth trajectories in such complex scenarios. We also compare the performance
with prior methods based on metrics such as trajectory length, mean time to
goal, success rate, and smoothness and observe considerable improvement.Comment: 8 pages, 7 figure
Learning Agile Robotic Locomotion Skills by Imitating Animals
Reproducing the diverse and agile locomotion skills of animals has been a
longstanding challenge in robotics. While manually-designed controllers have
been able to emulate many complex behaviors, building such controllers involves
a time-consuming and difficult development process, often requiring substantial
expertise of the nuances of each skill. Reinforcement learning provides an
appealing alternative for automating the manual effort involved in the
development of controllers. However, designing learning objectives that elicit
the desired behaviors from an agent can also require a great deal of
skill-specific expertise. In this work, we present an imitation learning system
that enables legged robots to learn agile locomotion skills by imitating
real-world animals. We show that by leveraging reference motion data, a single
learning-based approach is able to automatically synthesize controllers for a
diverse repertoire behaviors for legged robots. By incorporating sample
efficient domain adaptation techniques into the training process, our system is
able to learn adaptive policies in simulation that can then be quickly adapted
for real-world deployment. To demonstrate the effectiveness of our system, we
train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging
from different locomotion gaits to dynamic hops and turns
Imitating Driver Behavior with Generative Adversarial Networks
The ability to accurately predict and simulate human driving behavior is
critical for the development of intelligent transportation systems. Traditional
modeling methods have employed simple parametric models and behavioral cloning.
This paper adopts a method for overcoming the problem of cascading errors
inherent in prior approaches, resulting in realistic behavior that is robust to
trajectory perturbations. We extend Generative Adversarial Imitation Learning
to the training of recurrent policies, and we demonstrate that our model
outperforms rule-based controllers and maximum likelihood models in realistic
highway simulations. Our model both reproduces emergent behavior of human
drivers, such as lane change rate, while maintaining realistic control over
long time horizons.Comment: 8 pages, 6 figure
Operation and Imitation under Safety-Aware Shared Control
We describe a shared control methodology that can, without knowledge of the
task, be used to improve a human's control of a dynamic system, be used as a
training mechanism, and be used in conjunction with Imitation Learning to
generate autonomous policies that recreate novel behaviors. Our algorithm
introduces autonomy that assists the human partner by enforcing safety and
stability constraints. The autonomous agent has no a priori knowledge of the
desired task and therefore only adds control information when there is concern
for the safety of the system. We evaluate the efficacy of our approach with a
human subjects study consisting of 20 participants. We find that our shared
control algorithm significantly improves the rate at which users are able to
successfully execute novel behaviors. Experimental results suggest that the
benefits of our safety-aware shared control algorithm also extend to the human
partner's understanding of the system and their control skill. Finally, we
demonstrate how a combination of our safety-aware shared control algorithm and
Imitation Learning can be used to autonomously recreate the demonstrated
behaviors.Comment: Published in WAFR 201
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Adversarial learning methods have been proposed for a wide range of
applications, but the training of adversarial models can be notoriously
unstable. Effectively balancing the performance of the generator and
discriminator is critical, since a discriminator that achieves very high
accuracy will produce relatively uninformative gradients. In this work, we
propose a simple and general technique to constrain information flow in the
discriminator by means of an information bottleneck. By enforcing a constraint
on the mutual information between the observations and the discriminator's
internal representation, we can effectively modulate the discriminator's
accuracy and maintain useful and informative gradients. We demonstrate that our
proposed variational discriminator bottleneck (VDB) leads to significant
improvements across three distinct application areas for adversarial learning
algorithms. Our primary evaluation studies the applicability of the VDB to
imitation learning of dynamic continuous control skills, such as running. We
show that our method can learn such skills directly from \emph{raw} video
demonstrations, substantially outperforming prior adversarial imitation
learning methods. The VDB can also be combined with adversarial inverse
reinforcement learning to learn parsimonious reward functions that can be
transferred and re-optimized in new settings. Finally, we demonstrate that VDB
can train GANs more effectively for image generation, improving upon a number
of prior stabilization methods
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent
years, with notable achievements such as Deepmind's AlphaGo. It has been
successfully deployed in commercial vehicles like Mobileye's path planning
system. However, a vast majority of work on DRL is focused on toy examples in
controlled synthetic car simulator environments such as TORCS and CARLA. In
general, DRL is still at its infancy in terms of usability in real-world
applications. Our goal in this paper is to encourage real-world deployment of
DRL in various autonomous driving (AD) applications. We first provide an
overview of the tasks in autonomous driving systems, reinforcement learning
algorithms and applications of DRL to AD systems. We then discuss the
challenges which must be addressed to enable further progress towards
real-world deployment.Comment: Accepted for Oral Presentation at VISAPP 201
Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning
Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks
Socially Compliant Navigation through Raw Depth Inputs with Generative Adversarial Imitation Learning
We present an approach for mobile robots to learn to navigate in dynamic
environments with pedestrians via raw depth inputs, in a socially compliant
manner. To achieve this, we adopt a generative adversarial imitation learning
(GAIL) strategy, which improves upon a pre-trained behavior cloning policy. Our
approach overcomes the disadvantages of previous methods, as they heavily
depend on the full knowledge of the location and velocity information of nearby
pedestrians, which not only requires specific sensors, but also the extraction
of such state information from raw sensory input could consume much computation
time. In this paper, our proposed GAIL-based model performs directly on raw
depth inputs and plans in real-time. Experiments show that our GAIL-based
approach greatly improves the safety and efficiency of the behavior of mobile
robots from pure behavior cloning. The real-world deployment also shows that
our method is capable of guiding autonomous vehicles to navigate in a socially
compliant manner directly through raw depth inputs. In addition, we release a
simulation plugin for modeling pedestrian behaviors based on the social force
model.Comment: ICRA 2018 camera-ready version. 7 pages, video link:
https://www.youtube.com/watch?v=0hw0GD3lkA
- …