127,866 research outputs found
Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS
There are increasing real-time live applications in virtual reality, where it
plays an important role in capturing and retargetting 3D human pose. But it is
still challenging to estimate accurate 3D pose from consumer imaging devices
such as depth camera. This paper presents a novel cascaded 3D full-body pose
regression method to estimate accurate pose from a single depth image at 100
fps. The key idea is to train cascaded regressors based on Gradient Boosting
algorithm from pre-recorded human motion capture database. By incorporating
hierarchical kinematics model of human pose into the learning procedure, we can
directly estimate accurate 3D joint angles instead of joint positions. The
biggest advantage of this model is that the bone length can be preserved during
the whole 3D pose estimation procedure, which leads to more effective features
and higher pose estimation accuracy. Our method can be used as an
initialization procedure when combining with tracking methods. We demonstrate
the power of our method on a wide range of synthesized human motion data from
CMU mocap database, Human3.6M dataset and real human movements data captured in
real time. In our comparison against previous 3D pose estimation methods and
commercial system such as Kinect 2017, we achieve the state-of-the-art
accuracy
Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion Modeling
Most of the previous 3D human pose estimation work relied on the powerful
memory capability of the network to obtain suitable 2D-3D mappings from the
training data. Few works have studied the modeling of human posture deformation
in motion. In this paper, we propose a new modeling method for human pose
deformations and design an accompanying diffusion-based motion prior. Inspired
by the field of non-rigid structure-from-motion, we divide the task of
reconstructing 3D human skeletons in motion into the estimation of a 3D
reference skeleton, and a frame-by-frame skeleton deformation. A mixed
spatial-temporal NRSfMformer is used to simultaneously estimate the 3D
reference skeleton and the skeleton deformation of each frame from 2D
observations sequence, and then sum them to obtain the pose of each frame.
Subsequently, a loss term based on the diffusion model is used to ensure that
the pipeline learns the correct prior motion knowledge. Finally, we have
evaluated our proposed method on mainstream datasets and obtained superior
results outperforming the state-of-the-art
{MoCapDeform}: {M}onocular {3D} Human Motion Capture in Deformable Scenes
3D human motion capture from monocular RGB images respecting interactions ofa subject with complex and possibly deformable environments is a verychallenging, ill-posed and under-explored problem. Existing methods address itonly weakly and do not model possible surface deformations often occurring whenhumans interact with scene surfaces. In contrast, this paper proposesMoCapDeform, i.e., a new framework for monocular 3D human motion capture thatis the first to explicitly model non-rigid deformations of a 3D scene forimproved 3D human pose estimation and deformable environment reconstruction.MoCapDeform accepts a monocular RGB video and a 3D scene mesh aligned in thecamera space. It first localises a subject in the input monocular video alongwith dense contact labels using a new raycasting based strategy. Next, ourhuman-environment interaction constraints are leveraged to jointly optimiseglobal 3D human poses and non-rigid surface deformations. MoCapDeform achievessuperior accuracy than competing methods on several datasets, including ournewly recorded one with deforming background scenes.<br
Motion-DVAE: Unsupervised learning for fast human motion denoising
Pose and motion priors are crucial for recovering realistic and accurate
human motion from noisy observations. Substantial progress has been made on
pose and shape estimation from images, and recent works showed impressive
results using priors to refine frame-wise predictions. However, a lot of motion
priors only model transitions between consecutive poses and are used in
time-consuming optimization procedures, which is problematic for many
applications requiring real-time motion capture. We introduce Motion-DVAE, a
motion prior to capture the short-term dependencies of human motion. As part of
the dynamical variational autoencoder (DVAE) models family, Motion-DVAE
combines the generative capability of VAE models and the temporal modeling of
recurrent architectures. Together with Motion-DVAE, we introduce an
unsupervised learned denoising method unifying regression- and
optimization-based approaches in a single framework for real-time 3D human pose
estimation. Experiments show that the proposed approach reaches competitive
performance with state-of-the-art methods while being much faster
Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks
Recently, skeleton based action recognition gains more popularity due to
cost-effective depth sensors coupled with real-time skeleton estimation
algorithms. Traditional approaches based on handcrafted features are limited to
represent the complexity of motion patterns. Recent methods that use Recurrent
Neural Networks (RNN) to handle raw skeletons only focus on the contextual
dependency in the temporal domain and neglect the spatial configurations of
articulated skeletons. In this paper, we propose a novel two-stream RNN
architecture to model both temporal dynamics and spatial configurations for
skeleton based action recognition. We explore two different structures for the
temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
according to human body kinematics. We also propose two effective methods to
model the spatial structure by converting the spatial graph into a sequence of
joints. To improve generalization of our model, we further exploit 3D
transformation based data augmentation techniques including rotation and
scaling transformation to transform the 3D coordinates of skeletons during
training. Experiments on 3D action recognition benchmark datasets show that our
method brings a considerable improvement for a variety of actions, i.e.,
generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
D&D: Learning Human Dynamics from Dynamic Camera
3D human pose estimation from a monocular video has recently seen significant
improvements. However, most state-of-the-art methods are kinematics-based,
which are prone to physically implausible motions with pronounced artifacts.
Current dynamics-based methods can predict physically plausible motion but are
restricted to simple scenarios with static camera view. In this work, we
present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the
laws of physics to reconstruct 3D human motion from the in-the-wild videos with
a moving camera. D&D introduces inertial force control (IFC) to explain the 3D
human motion in the non-inertial local frame by considering the inertial forces
of the dynamic camera. To learn the ground contact with limited annotations, we
develop probabilistic contact torque (PCT), which is computed by differentiable
sampling from contact probabilities and used to generate motions. The contact
state can be weakly supervised by encouraging the model to generate correct
motions. Furthermore, we propose an attentive PD controller that adjusts target
pose states using temporal information to obtain smooth and accurate pose
control. Our approach is entirely neural-based and runs without offline
optimization or simulation in physics engines. Experiments on large-scale 3D
human motion benchmarks demonstrate the effectiveness of D&D, where we exhibit
superior performance against both state-of-the-art kinematics-based and
dynamics-based methods. Code is available at https://github.com/Jeffsjtu/DnDComment: ECCV 2022 (Oral
Robust recognition and segmentation of human actions using HMMs with missing observations
This paper describes the integration of missing observation data with hidden Markov models to create a framework that is able to segment and classify individual actions from a stream of human motion using an incomplete 3D human pose estimation. Based on this framework, a model is trained to automatically segment and classify an activity sequence into its constituent subactions during inferencing. This is achieved by introducing action labels into the observation vector and setting these labels as missing data during inferencing, thus forcing the system to infer the probability of each action label. Additionally, missing data provides recognition-level support for occlusions and imperfect silhouette segmentation, permitting the use of a fast (real-time) pose estimation that delegates the burden of handling undetected limbs onto the action recognition system. Findings show that the use of missing data to segment activities is an accurate and elegant approach. Furthermore, action recognition can be accurate even when almost half of the pose feature data is missing due to occlusions, since not all of the pose data is important all of the time
Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes
We consider the problem of recovering a single person's 3D human mesh from
in-the-wild crowded scenes. While much progress has been in 3D human mesh
estimation, existing methods struggle when test input has crowded scenes. The
first reason for the failure is a domain gap between training and testing data.
A motion capture dataset, which provides accurate 3D labels for training, lacks
crowd data and impedes a network from learning crowded scene-robust image
features of a target person. The second reason is a feature processing that
spatially averages the feature map of a localized bounding box containing
multiple people. Averaging the whole feature map makes a target person's
feature indistinguishable from others. We present 3DCrowdNet that firstly
explicitly targets in-the-wild crowded scenes and estimates a robust 3D human
mesh by addressing the above issues. First, we leverage 2D human pose
estimation that does not require a motion capture dataset with 3D labels for
training and does not suffer from the domain gap. Second, we propose a
joint-based regressor that distinguishes a target person's feature from others.
Our joint-based regressor preserves the spatial activation of a target by
sampling features from the target's joint locations and regresses human model
parameters. As a result, 3DCrowdNet learns target-focused features and
effectively excludes the irrelevant features of nearby persons. We conduct
experiments on various benchmarks and prove the robustness of 3DCrowdNet to the
in-the-wild crowded scenes both quantitatively and qualitatively. The code is
available at https://github.com/hongsukchoi/3DCrowdNet_RELEASE.Comment: Accepted to CVPR 2022, 16 pages including the supplementary materia
VALIDATION OF MODEL-BASED IMAGE-MATCHING TECHNIQUE WITH BONE-PIN MARKER BASED MOTION ANALYSIS ON ANKLE KINEMATICS: A CADAVER STUDY
Krosshaug (2005) introduced a model-based image-matching (MBIM) technique for 3D reconstruction of human motion from uncalibrated video sequences. The aim of this study is to validate the MBIM technique on ankle joint movement with the reference to bone-pin marker based motion analysis on a cadaver. One cadaveric below-hip specimen was prepared for performing full-range plantarflexion/dorsiflexion, inversion/eversion and relative circular motion between two segments. The videos were recorded and analyzed by the MBIM technique and bone-pin marker based motion analysis. The results are presented as the qualitative visual evaluation and the root mean square (RMS) error. In general, the validation results showed good agreement between the MBIM estimation and bone-pin marker based motion analysis results. This technique will contribute to the motion measurement of ankle joint kinematics in the future, for instance, the motion analysis in real game situations and understanding the injury mechanisms of real injury cases
- …