29 research outputs found
Transferring Autonomous Driving Knowledge on Simulated and Real Intersections
We view intersection handling on autonomous vehicles as a reinforcement
learning problem, and study its behavior in a transfer learning setting. We
show that a network trained on one type of intersection generally is not able
to generalize to other intersections. However, a network that is pre-trained on
one intersection and fine-tuned on another performs better on the new task
compared to training in isolation. This network also retains knowledge of the
prior task, even though some forgetting occurs. Finally, we show that the
benefits of fine-tuning hold when transferring simulated intersection handling
knowledge to a real autonomous vehicle.Comment: Appeared in Lifelong Learning Workshop @ ICML 2017. arXiv admin note:
text overlap with arXiv:1705.0119
Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight
Deep reinforcement learning provides a promising approach for vision-based
control of real-world robots. However, the generalization of such models
depends critically on the quantity and variety of data available for training.
This data can be difficult to obtain for some types of robotic systems, such as
fragile, small-scale quadrotors. Simulated rendering and physics can provide
for much larger datasets, but such data is inherently of lower quality: many of
the phenomena that make the real-world autonomous flight problem challenging,
such as complex physics and air currents, are modeled poorly or not at all, and
the systematic differences between simulation and the real world are typically
impossible to eliminate. In this work, we investigate how data from both
simulation and the real world can be combined in a hybrid deep reinforcement
learning algorithm. Our method uses real-world data to learn about the dynamics
of the system, and simulated data to learn a generalizable perception system
that can enable the robot to avoid collisions using only a monocular camera. We
demonstrate our approach on a real-world nano aerial vehicle collision
avoidance task, showing that with only an hour of real-world data, the
quadrotor can avoid collisions in new environments with various lighting
conditions and geometry. Code, instructions for building the aerial vehicles,
and videos of the experiments can be found at github.com/gkahn13/GtSComment: First three authors contributed equally. Accepted to ICRA 201
Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience
We consider the problem of transferring policies to the real world by
training on a distribution of simulated scenarios. Rather than manually tuning
the randomization of simulations, we adapt the simulation parameter
distribution using a few real world roll-outs interleaved with policy training.
In doing so, we are able to change the distribution of simulations to improve
the policy transfer by matching the policy behavior in simulation and the real
world. We show that policies trained with our method are able to reliably
transfer to different robots in two real world tasks: swing-peg-in-hole and
opening a cabinet drawer. The video of our experiments can be found at
https://sites.google.com/view/simop
Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding
The ability to transfer a policy from one environment to another is a
promising avenue for efficient robot learning in realistic settings where task
supervision is not available. This can allow us to take advantage of
environments well suited for training, such as simulators or laboratories, to
learn a policy for a real robot in a home or office. To succeed, such policy
transfer must overcome both the visual domain gap (e.g. different illumination
or background) and the dynamics domain gap (e.g. different robot calibration or
modelling error) between source and target environments. However, prior policy
transfer approaches either cannot handle a large domain gap or can only address
one type of domain gap at a time. In this paper, we propose a novel policy
transfer method with iterative "environment grounding", IDAPT, that alternates
between (1) directly minimizing both visual and dynamics domain gaps by
grounding the source environment in the target environment domains, and (2)
training a policy on the grounded source environment. This iterative training
progressively aligns the domains between the two environments and adapts the
policy to the target environment. Once trained, the policy can be directly
executed on the target environment. The empirical results on locomotion and
robotic manipulation tasks demonstrate that our approach can effectively
transfer a policy across visual and dynamics domain gaps with minimal
supervision and interaction with the target environment. Videos and code are
available at https://clvrai.com/idapt .Comment: Robotics: Science and Systems (RSS), 202
Data Efficient Lithography Modeling with Transfer Learning and Active Data Selection
Lithography simulation is one of the key steps in physical verification,
enabled by the substantial optical and resist models. A resist model bridges
the aerial image simulation to printed patterns. While the effectiveness of
learning-based solutions for resist modeling has been demonstrated, they are
considerably data-demanding. Meanwhile, a set of manufactured data for a
specific lithography configuration is only valid for the training of one single
model, indicating low data efficiency. Due to the complexity of the
manufacturing process, obtaining enough data for acceptable accuracy becomes
very expensive in terms of both time and cost, especially during the evolution
of technology generations when the design space is intensively explored. In
this work, we propose a new resist modeling framework for contact layers,
utilizing existing data from old technology nodes and active selection of data
in a target technology node, to reduce the amount of data required from the
target lithography configuration. Our framework based on transfer learning and
active learning techniques is effective within a competitive range of accuracy,
i.e., 3-10X reduction on the amount of training data with comparable accuracy
to the state-of-the-art learning approach
Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer
Learning robotic control policies in the real world gives rise to challenges
in data efficiency, safety, and controlling the initial condition of the
system. On the other hand, simulations are a useful alternative as they provide
an abundant source of data without the restrictions of the real world.
Unfortunately, simulations often fail to accurately model complex real-world
phenomena. Traditional system identification techniques are limited in
expressiveness by the analytical model parameters, and usually are not
sufficient to capture such phenomena. In this paper we propose a general
framework for improving the analytical model by optimizing state dependent
generalized forces. State dependent generalized forces are expressive enough to
model constraints in the equations of motion, while maintaining a clear
physical meaning and intuition. We use reinforcement learning to efficiently
optimize the mapping from states to generalized forces over a discounted
infinite horizon. We show that using only minutes of real world data improves
the sim-to-real control policy transfer. We demonstrate the feasibility of our
approach by validating it on a nonprehensile manipulation task on the Sawyer
robot
Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
We propose a method to predict the sim-to-real transfer performance of RL
policies. Our transfer metric simplifies the selection of training setups (such
as algorithm, hyperparameters, randomizations) and policies in simulation,
without the need for extensive and time-consuming real-world rollouts. A
probabilistic dynamics model is trained alongside the policy and evaluated on a
fixed set of real-world trajectories to obtain the transfer metric. Experiments
show that the transfer metric is highly correlated with policy performance in
both simulated and real-world robotic environments for complex manipulation
tasks. We further show that the transfer metric can predict the effect of
training setups on policy transfer performance
Learning Fast Adaptation with Meta Strategy Optimization
The ability to walk in new scenarios is a key milestone on the path toward
real-world applications of legged robots. In this work, we introduce Meta
Strategy Optimization, a meta-learning algorithm for training policies with
latent variable inputs that can quickly adapt to new scenarios with a handful
of trials in the target environment. The key idea behind MSO is to expose the
same adaptation process, Strategy Optimization (SO), to both the training and
testing phases. This allows MSO to effectively learn locomotion skills as well
as a latent space that is suitable for fast adaptation. We evaluate our method
on a real quadruped robot and demonstrate successful adaptation in various
scenarios, including sim-to-real transfer, walking with a weakened motor, or
climbing up a slope. Furthermore, we quantitatively analyze the generalization
capability of the trained policy in simulated environments. Both real and
simulated experiments show that our method outperforms previous methods in
adaptation to novel tasks
Policy Transfer via Kinematic Domain Randomization and Adaptation
Transferring reinforcement learning policies trained in physics simulation to
the real hardware remains a challenge, known as the "sim-to-real" gap. Domain
randomization is a simple yet effective technique to address dynamics
discrepancies across source and target domains, but its success generally
depends on heuristics and trial-and-error. In this work we investigate the
impact of randomized parameter selection on policy transferability across
different types of domain discrepancies. Contrary to common practice in which
kinematic parameters are carefully measured while dynamic parameters are
randomized, we found that virtually randomizing kinematic parameters (e.g.,
link lengths) during training in simulation generally outperforms dynamic
randomization. Based on this finding, we introduce a new domain adaptation
algorithm that utilizes simulated kinematic parameters variation. Our
algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal
policies conditioned on virtual kinematic parameters and efficiently adapts to
the target environment using a limited number of target domain rollouts. We
showcase our findings on a simulated quadruped robot in five different target
environments covering different aspects of domain discrepancies.Comment: Submitted to the 2021 IEEE International Conference on Robotics and
Automation (ICRA
TuneNet: One-Shot Residual Tuning for System Identification and Sim-to-Real Robot Task Transfer
As researchers teach robots to perform more and more complex tasks, the need
for realistic simulation environments is growing. Existing techniques for
closing the reality gap by approximating real-world physics often require
extensive real world data and/or thousands of simulation samples. This paper
presents TuneNet, a new machine learning-based method to directly tune the
parameters of one model to match another using an *iterative residual tuning*
technique. TuneNet estimates the parameter difference between two models using
a single observation from the target and minimal simulation, allowing rapid,
accurate and sample-efficient parameter estimation. The system can be trained
via supervised learning over an auto-generated simulated dataset. We show that
TuneNet can perform system identification, even when the true parameter values
lie well outside the distribution seen during training, and demonstrate that
simulators tuned with TuneNet outperform existing techniques for predicting
rigid body motion. Finally, we show that our method can estimate real-world
parameter values, allowing a robot to perform sim-to-real task transfer on a
dynamic manipulation task unseen during training. Code and videos are available
online at http://bit.ly/2lf1bAw.Comment: Published at CoRL 201