11 research outputs found
Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening
Human motion synthesis is a long-standing problem with various applications
in digital twins and the Metaverse. However, modern deep learning based motion
synthesis approaches barely consider the physical plausibility of synthesized
motions and consequently they usually produce unrealistic human motions. In
order to solve this problem, we propose a system ``Skeleton2Humanoid'' which
performs physics-oriented motion correction at test time by regularizing
synthesized skeleton motions in a physics simulator. Concretely, our system
consists of three sequential stages: (I) test time motion synthesis network
adaptation, (II) skeleton to humanoid matching and (III) motion imitation based
on reinforcement learning (RL). Stage I introduces a test time adaptation
strategy, which improves the physical plausibility of synthesized human
skeleton motions by optimizing skeleton joint locations. Stage II performs an
analytical inverse kinematics strategy, which converts the optimized human
skeleton motions to humanoid robot motions in a physics simulator, then the
converted humanoid robot motions can be served as reference motions for the RL
policy to imitate. Stage III introduces a curriculum residual force control
policy, which drives the humanoid robot to mimic complex converted reference
motions in accordance with the physical law. We verify our system on a typical
human motion synthesis task, motion-in-betweening. Experiments on the
challenging LaFAN1 dataset show our system can outperform prior methods
significantly in terms of both physical plausibility and accuracy. Code will be
released for research purposes at:
https://github.com/michaelliyunhao/Skeleton2HumanoidComment: Accepted by ACMMM202
Embodied Scene-aware Human Pose Estimation
We propose embodied scene-aware human pose estimation where we estimate 3D
poses based on a simulated agent's proprioception and scene awareness, along
with external third-person observations. Unlike prior methods that often resort
to multistage optimization, non-causal inference, and complex contact modeling
to estimate human pose and human scene interactions, our method is one stage,
causal, and recovers global 3D human poses in a simulated environment. Since 2D
third-person observations are coupled with the camera pose, we propose to
disentangle the camera pose and use a multi-step projection gradient defined in
the global coordinate frame as the movement cue for our embodied agent.
Leveraging a physics simulation and prescanned scenes (e.g., 3D mesh), we
simulate our agent in everyday environments (libraries, offices, bedrooms,
etc.) and equip our agent with environmental sensors to intelligently navigate
and interact with scene geometries. Our method also relies only on 2D keypoints
and can be trained on synthetic datasets derived from popular human motion
databases. To evaluate, we use the popular H36M and PROX datasets and, for the
first time, achieve a success rate of 96.7% on the challenging PROX dataset
without ever using PROX motion sequences for training.Comment: Project website: https://embodiedscene.github.io/embodiedpose/
Zhengyi Luo and Shun Iwase contributed equall