1,635 research outputs found
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs
We address the problem of making human motion capture in the wild more
practical by using a small set of inertial sensors attached to the body. Since
the problem is heavily under-constrained, previous methods either use a large
number of sensors, which is intrusive, or they require additional video input.
We take a different approach and constrain the problem by: (i) making use of a
realistic statistical body model that includes anthropometric constraints and
(ii) using a joint optimization framework to fit the model to orientation and
acceleration measurements over multiple frames. The resulting tracker Sparse
Inertial Poser (SIP) enables 3D human pose estimation using only 6 sensors
(attached to the wrists, lower legs, back and head) and works for arbitrary
human motions. Experiments on the recently released TNT15 dataset show that,
using the same number of sensors, SIP achieves higher accuracy than the dataset
baseline without using any video data. We further demonstrate the effectiveness
of SIP on newly recorded challenging motions in outdoor scenarios such as
climbing or jumping over a wall.Comment: 12 pages, Accepted at Eurographics 201
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
We present a method for simultaneously estimating 3D human pose and body
shape from a sparse set of wide-baseline camera views. We train a symmetric
convolutional autoencoder with a dual loss that enforces learning of a latent
representation that encodes skeletal joint positions, and at the same time
learns a deep representation of volumetric body shape. We harness the latter to
up-scale input volumetric data by a factor of , whilst recovering a
3D estimate of joint positions with equal or greater accuracy than the state of
the art. Inference runs in real-time (25 fps) and has the potential for passive
human behaviour monitoring where there is a requirement for high fidelity
estimation of human body shape and pose
Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture
Either RGB images or inertial signals have been used for the task of motion
capture (mocap), but combining them together is a new and interesting topic. We
believe that the combination is complementary and able to solve the inherent
difficulties of using one modality input, including occlusions, extreme
lighting/texture, and out-of-view for visual mocap and global drifts for
inertial mocap. To this end, we propose a method that fuses monocular images
and sparse IMUs for real-time human motion capture. Our method contains a dual
coordinate strategy to fully explore the IMU signals with different goals in
motion capture. To be specific, besides one branch transforming the IMU signals
to the camera coordinate system to combine with the image information, there is
another branch to learn from the IMU signals in the body root coordinate system
to better estimate body poses. Furthermore, a hidden state feedback mechanism
is proposed for both two branches to compensate for their own drawbacks in
extreme input cases. Thus our method can easily switch between the two kinds of
signals or combine them in different cases to achieve a robust mocap. %The two
divided parts can help each other for better mocap results under different
conditions. Quantitative and qualitative results demonstrate that by delicately
designing the fusion method, our technique significantly outperforms the
state-of-the-art vision, IMU, and combined methods on both global orientation
and local pose estimation. Our codes are available for research at
https://shaohua-pan.github.io/robustcap-page/.Comment: Accepted by SIGGRAPH ASIA 2023. Project page:
https://shaohua-pan.github.io/robustcap-page
Deep Learning-Based Human Pose Estimation: A Survey
Human pose estimation aims to locate the human body parts and build human
body representation (e.g., body skeleton) from input data such as images and
videos. It has drawn increasing attention during the past decade and has been
utilized in a wide range of applications including human-computer interaction,
motion analysis, augmented reality, and virtual reality. Although the recently
developed deep learning-based solutions have achieved high performance in human
pose estimation, there still remain challenges due to insufficient training
data, depth ambiguities, and occlusion. The goal of this survey paper is to
provide a comprehensive review of recent deep learning-based solutions for both
2D and 3D pose estimation via a systematic analysis and comparison of these
solutions based on their input data and inference procedures. More than 240
research papers since 2014 are covered in this survey. Furthermore, 2D and 3D
human pose estimation datasets and evaluation metrics are included.
Quantitative performance comparisons of the reviewed methods on popular
datasets are summarized and discussed. Finally, the challenges involved,
applications, and future research directions are concluded. We also provide a
regularly updated project page: \url{https://github.com/zczcwh/DL-HPE
HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR
We propose Human-centered 4D Scene Capture (HSC4D) to accurately and
efficiently create a dynamic digital world, containing large-scale
indoor-outdoor scenes, diverse human motions, and rich interactions between
humans and environments. Using only body-mounted IMUs and LiDAR, HSC4D is
space-free without any external devices' constraints and map-free without
pre-built maps. Considering that IMUs can capture human poses but always drift
for long-period use, while LiDAR is stable for global localization but rough
for local positions and orientations, HSC4D makes both sensors complement each
other by a joint optimization and achieves promising results for long-term
capture. Relationships between humans and environments are also explored to
make their interaction more realistic. To facilitate many down-stream tasks,
like AR, VR, robots, autonomous driving, etc., we propose a dataset containing
three large scenes (1k-5k ) with accurate dynamic human motions and
locations. Diverse scenarios (climbing gym, multi-story building, slope, etc.)
and challenging human activities (exercising, walking up/down stairs, climbing,
etc.) demonstrate the effectiveness and the generalization ability of HSC4D.
The dataset and code are available at http://www.lidarhumanmotion.net/hsc4d/.Comment: 10 pages, 8 figures, CVPR202
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera
We present a real-time approach for multi-person 3D motion capture at over 30
fps using a single RGB camera. It operates successfully in generic scenes which
may contain occlusions by objects and by other people. Our method operates in
subsequent stages. The first stage is a convolutional neural network (CNN) that
estimates 2D and 3D pose features along with identity assignments for all
visible joints of all individuals.We contribute a new architecture for this
CNN, called SelecSLS Net, that uses novel selective long and short range skip
connections to improve the information flow allowing for a drastically faster
network without compromising accuracy. In the second stage, a fully connected
neural network turns the possibly partial (on account of occlusion) 2Dpose and
3Dpose features for each subject into a complete 3Dpose estimate per
individual. The third stage applies space-time skeletal model fitting to the
predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose,
and enforce temporal coherence. Our method returns the full skeletal pose in
joint angles for each subject. This is a further key distinction from previous
work that do not produce joint angle results of a coherent skeleton in real
time for multi-person scenes. The proposed system runs on consumer hardware at
a previously unseen speed of more than 30 fps given 512x320 images as input
while achieving state-of-the-art accuracy, which we will demonstrate on a range
of challenging real-world scenes.Comment: To appear in ACM Transactions on Graphics (SIGGRAPH) 202
- …