4,702 research outputs found
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide
an abundant source of data and alleviate certain safety concerns during the
training process. But the behaviours developed by agents in simulation are
often specific to the characteristics of the simulator. Due to modeling error,
strategies that are successful in simulation may not transfer to their real
world counterparts. In this paper, we demonstrate a simple method to bridge
this "reality gap". By randomizing the dynamics of the simulator during
training, we are able to develop policies that are capable of adapting to very
different dynamics, including ones that differ significantly from the dynamics
on which the policies were trained. This adaptivity enables the policies to
generalize to the dynamics of the real world without any training on the
physical system. Our approach is demonstrated on an object pushing task using a
robotic arm. Despite being trained exclusively in simulation, our policies are
able to maintain a similar level of performance when deployed on a real robot,
reliably moving an object to a desired location from random initial
configurations. We explore the impact of various design decisions and show that
the resulting policies are robust to significant calibration error
Effects of Saltatory Rewards and Generalized Advantage Estimation on Reference-Based Deep Reinforcement Learning of Humanlike Motions
In the application of learning physics-based character skills, deep reinforcement learning (DRL) can lead to slow convergence and local optimum solutions during the training process of a reinforcement learning (RL) agent. With the presence of an environment with reward saltation, we can easily plan to magnify those saltatory rewards with the perspective of sample usage to increase the experience pool of an agent during this training process. In our work, we have proposed two modified algorithms. The first one is the addition of a parameter based reward optimization process to magnify the saltatory rewards and thus increasing an agent’s utilization of previous experiences. We have added this parameter based reward optimization with proximal policy optimization (PPO) algorithm. What’s more, the other proposed algorithm introduces generalized advantage estimation in estimating the advantage of the advantage actor critic (A2C) algorithm which resulted in faster convergence to the global optimal solutions of DRL. We have conducted all our experiments to measure their performances in a custom reinforcement learning environment built using a physics engine named PyBullet. In that custom environment, the RL agent has a humanoid body which learns humanlike motions, e.g., walk, run, spin, cartwheel, spinkick, and backflip, from imitating example reference motions using the RL algorithms. Our experiments have shown significant improvement in performance and convergence speed of DRL in this custom environment for learning humanlike motions using the modified versions of PPO and A2C if compared with their vanilla versions
PhysDiff: Physics-Guided Human Motion Diffusion Model
Denoising diffusion models hold great promise for generating diverse and
realistic human motions. However, existing motion diffusion models largely
disregard the laws of physics in the diffusion process and often generate
physically-implausible motions with pronounced artifacts such as floating, foot
sliding, and ground penetration. This seriously impacts the quality of
generated motions and limits their real-world application. To address this
issue, we present a novel physics-guided motion diffusion model (PhysDiff),
which incorporates physical constraints into the diffusion process.
Specifically, we propose a physics-based motion projection module that uses
motion imitation in a physics simulator to project the denoised motion of a
diffusion step to a physically-plausible motion. The projected motion is
further used in the next diffusion step to guide the denoising diffusion
process. Intuitively, the use of physics in our model iteratively pulls the
motion toward a physically-plausible space. Experiments on large-scale human
motion datasets show that our approach achieves state-of-the-art motion quality
and improves physical plausibility drastically (>78% for all datasets).Comment: Project page: https://nvlabs.github.io/PhysDif
- …