11 research outputs found
QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars
Real-time tracking of human body motion is crucial for interactive and
immersive experiences in AR/VR. However, very limited sensor data about the
body is available from standalone wearable devices such as HMDs (Head Mounted
Devices) or AR glasses. In this work, we present a reinforcement learning
framework that takes in sparse signals from an HMD and two controllers, and
simulates plausible and physically valid full body motions. Using high quality
full body motion as dense supervision during training, a simple policy network
can learn to output appropriate torques for the character to balance, walk, and
jog, while closely following the input signals. Our results demonstrate
surprisingly similar leg motions to ground truth without any observations of
the lower body, even when the input is only the 6D transformations of the HMD.
We also show that a single policy can be robust to diverse locomotion styles,
different body sizes, and novel environments
DROP: Dynamics Responses from Human Motion Prior and Projective Dynamics
Synthesizing realistic human movements, dynamically responsive to the
environment, is a long-standing objective in character animation, with
applications in computer vision, sports, and healthcare, for motion prediction
and data augmentation. Recent kinematics-based generative motion models offer
impressive scalability in modeling extensive motion data, albeit without an
interface to reason about and interact with physics. While
simulator-in-the-loop learning approaches enable highly physically realistic
behaviors, the challenges in training often affect scalability and adoption. We
introduce DROP, a novel framework for modeling Dynamics Responses of humans
using generative mOtion prior and Projective dynamics. DROP can be viewed as a
highly stable, minimalist physics-based human simulator that interfaces with a
kinematics-based generative motion prior. Utilizing projective dynamics, DROP
allows flexible and simple integration of the learned motion prior as one of
the projective energies, seamlessly incorporating control provided by the
motion prior with Newtonian dynamics. Serving as a model-agnostic plug-in, DROP
enables us to fully leverage recent advances in generative motion models for
physics-based motion synthesis. We conduct extensive evaluations of our model
across different motion tasks and various physical perturbations, demonstrating
the scalability and diversity of responses.Comment: SIGGRAPH Asia 2023, Video https://youtu.be/tF5WW7qNMLI, Website:
https://stanford-tml.github.io/drop
QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors
Replicating a user's pose from only wearable sensors is important for many
AR/VR applications. Most existing methods for motion tracking avoid environment
interaction apart from foot-floor contact due to their complex dynamics and
hard constraints. However, in daily life people regularly interact with their
environment, e.g. by sitting on a couch or leaning on a desk. Using
Reinforcement Learning, we show that headset and controller pose, if combined
with physics simulation and environment observations can generate realistic
full-body poses even in highly constrained environments. The physics simulation
automatically enforces the various constraints necessary for realistic poses,
instead of manually specifying them as in many kinematic approaches. These hard
constraints allow us to achieve high-quality interaction motions without
typical artifacts such as penetration or contact sliding. We discuss three
features, the environment representation, the contact reward and scene
randomization, crucial to the performance of the method. We demonstrate the
generality of the approach through various examples, such as sitting on chairs,
a couch and boxes, stepping over boxes, rocking a chair and turning an office
chair. We believe these are some of the highest-quality results achieved for
motion tracking from sparse sensor with scene interaction
Leveraging Demonstrations with Latent Space Priors
Demonstrations provide insight into relevant state or action space regions,
bearing great potential to boost the efficiency and practicality of
reinforcement learning agents. In this work, we propose to leverage
demonstration datasets by combining skill learning and sequence modeling.
Starting with a learned joint latent space, we separately train a generative
model of demonstration sequences and an accompanying low-level policy. The
sequence model forms a latent space prior over plausible demonstration
behaviors to accelerate learning of high-level policies. We show how to acquire
such priors from state-only motion capture demonstrations and explore several
methods for integrating them into policy learning on transfer tasks. Our
experimental results confirm that latent space priors provide significant gains
in learning speed and final performance. We benchmark our approach on a set of
challenging sparse-reward environments with a complex, simulated humanoid, and
on offline RL benchmarks for navigation and object manipulation. Videos, source
code and pre-trained models are available at the corresponding project website
at https://facebookresearch.github.io/latent-space-priors .Comment: Published in Transactions on Machine Learning Research (03/2023
ACE: Adversarial Correspondence Embedding for Cross Morphology Motion Retargeting from Human to Nonhuman Characters
Motion retargeting is a promising approach for generating natural and
compelling animations for nonhuman characters. However, it is challenging to
translate human movements into semantically equivalent motions for target
characters with different morphologies due to the ambiguous nature of the
problem. This work presents a novel learning-based motion retargeting
framework, Adversarial Correspondence Embedding (ACE), to retarget human
motions onto target characters with different body dimensions and structures.
Our framework is designed to produce natural and feasible robot motions by
leveraging generative-adversarial networks (GANs) while preserving high-level
motion semantics by introducing an additional feature loss. In addition, we
pretrain a robot motion prior that can be controlled in a latent embedding
space and seek to establish a compact correspondence. We demonstrate that the
proposed framework can produce retargeted motions for three different
characters -- a quadrupedal robot with a manipulator, a crab character, and a
wheeled manipulator. We further validate the design choices of our framework by
conducting baseline comparisons and a user study. We also showcase sim-to-real
transfer of the retargeted motions by transferring them to a real Spot robot
FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion
Robots operating in human environments require a diverse set of skills, including slow and fast walking, turning, side-stepping, and more. However, developing robot controllers capable of exhibiting such a broad range of behaviors is a challenging problem that necessitates meticulous investigation for each task. To address this challenge, we introduce a trajectory optimization method that resolves the kinematic infeasibility of reference animal motions. This method, combined with a model-based controller, results in a unified data-driven model-based control framework capable of imitating various animal gaits without the need for expensive simulation training or real-world fine-tuning. Our framework is capable of imitating a variety of motor skills such as trotting, pacing, turning, and side-stepping with ease. It shows superior tracking capabilities in both simulations and the real world compared to other imitation controllers, including a model-based one and a learning-based motion imitation technique
PMP: Learning to Physically Interact with Environments using Part-wise Motion Priors
We present a method to animate a character incorporating multiple part-wise
motion priors (PMP). While previous works allow creating realistic articulated
motions from reference data, the range of motion is largely limited by the
available samples. Especially for the interaction-rich scenarios, it is
impractical to attempt acquiring every possible interacting motion, as the
combination of physical parameters increases exponentially. The proposed PMP
allows us to assemble multiple part skills to animate a character, creating a
diverse set of motions with different combinations of existing data. In our
pipeline, we can train an agent with a wide range of part-wise priors.
Therefore, each body part can obtain a kinematic insight of the style from the
motion captures, or at the same time extract dynamics-related information from
the additional part-specific simulation. For example, we can first train a
general interaction skill, e.g. grasping, only for the dexterous part, and then
combine the expert trajectories from the pre-trained agent with the kinematic
priors of other limbs. Eventually, our whole-body agent learns a novel physical
interaction skill even with the absence of the object trajectories in the
reference motion sequence.Comment: 13 pages, 11 figure
Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation
Real-time human motion reconstruction from a sparse set of (e.g. six)
wearable IMUs provides a non-intrusive and economic approach to motion capture.
Without the ability to acquire position information directly from IMUs, recent
works took data-driven approaches that utilize large human motion datasets to
tackle this under-determined problem. Still, challenges remain such as temporal
consistency, drifting of global and joint motions, and diverse coverage of
motion types on various terrains. We propose a novel method to simultaneously
estimate full-body motion and generate plausible visited terrain from only six
IMU sensors in real-time. Our method incorporates 1. a conditional Transformer
decoder model giving consistent predictions by explicitly reasoning prediction
history, 2. a simple yet general learning target named "stationary body points"
(SBPs) which can be stably predicted by the Transformer model and utilized by
analytical routines to correct joint and global drifting, and 3. an algorithm
to generate regularized terrain height maps from noisy SBP predictions which
can in turn correct noisy global motion estimation. We evaluate our framework
extensively on synthesized and real IMU data, and with real-time live demos,
and show superior performance over strong baseline methods.Comment: SIGGRAPH Asia 2022. Video: https://youtu.be/rXb6SaXsnc