173 research outputs found
Cascaded deep monocular 3D human pose estimation with evolutionary training data
End-to-end deep representation learning has achieved remarkable accuracy for
monocular 3D human pose estimation, yet these models may fail for unseen poses
with limited and fixed training data. This paper proposes a novel data
augmentation method that: (1) is scalable for synthesizing massive amount of
training data (over 8 million valid 3D human poses with corresponding 2D
projections) for training 2D-to-3D networks, (2) can effectively reduce dataset
bias. Our method evolves a limited dataset to synthesize unseen 3D human
skeletons based on a hierarchical human representation and heuristics inspired
by prior knowledge. Extensive experiments show that our approach not only
achieves state-of-the-art accuracy on the largest public benchmark, but also
generalizes significantly better to unseen and rare poses. Code, pre-trained
models and tools are available at this HTTPS URL.Comment: Accepted to CVPR 2020 as Oral Presentatio
DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation
Due to the lack of diversity of datasets, the generalization ability of the
pose estimator is poor. To solve this problem, we propose a pose augmentation
solution via DH forward kinematics model, which we call DH-AUG. We observe that
the previous work is all based on single-frame pose augmentation, if it is
directly applied to video pose estimator, there will be several previously
ignored problems: (i) angle ambiguity in bone rotation (multiple solutions);
(ii) the generated skeleton video lacks movement continuity. To solve these
problems, we propose a special generator based on DH forward kinematics model,
which is called DH-generator. Extensive experiments demonstrate that DH-AUG can
greatly increase the generalization ability of the video pose estimator. In
addition, when applied to a single-frame 3D pose estimator, our method
outperforms the previous best pose augmentation method. The source code has
been released at
https://github.com/hlz0606/DH-AUG-DH-Forward-Kinematics-Model-Driven-Augmentation-for-3D-Human-Pose-Estimation
Augmenting Vision-Based Human Pose Estimation with Rotation Matrix
Fitness applications are commonly used to monitor activities within the gym,
but they often fail to automatically track indoor activities inside the gym.
This study proposes a model that utilizes pose estimation combined with a novel
data augmentation method, i.e., rotation matrix. We aim to enhance the
classification accuracy of activity recognition based on pose estimation data.
Through our experiments, we experiment with different classification algorithms
along with image augmentation approaches. Our findings demonstrate that the SVM
with SGD optimization, using data augmentation with the Rotation Matrix, yields
the most accurate results, achieving a 96% accuracy rate in classifying five
physical activities. Conversely, without implementing the data augmentation
techniques, the baseline accuracy remains at a modest 64%.Comment: 24 page
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
When applying a pre-trained 2D-to-3D human pose lifting model to a target
unseen dataset, large performance degradation is commonly encountered due to
domain shift issues. We observe that the degradation is caused by two factors:
1) the large distribution gap over global positions of poses between the source
and target datasets due to variant camera parameters and settings, and 2) the
deficient diversity of local structures of poses in training. To this end, we
combine \textbf{global adaptation} and \textbf{local generalization} in
\textit{PoseDA}, a simple yet effective framework of unsupervised domain
adaptation for 3D human pose estimation. Specifically, global adaptation aims
to align global positions of poses from the source domain to the target domain
with a proposed global position alignment (GPA) module. And local
generalization is designed to enhance the diversity of 2D-3D pose mapping with
a local pose augmentation (LPA) module. These modules bring significant
performance improvement without introducing additional learnable parameters. In
addition, we propose local pose augmentation (LPA) to enhance the diversity of
3D poses following an adversarial training scheme consisting of 1) a
augmentation generator that generates the parameters of pre-defined pose
transformations and 2) an anchor discriminator to ensure the reality and
quality of the augmented data. Our approach can be applicable to almost all
2D-3D lifting models. \textit{PoseDA} achieves 61.3 mm of MPJPE on MPI-INF-3DHP
under a cross-dataset evaluation setup, improving upon the previous
state-of-the-art method by 10.2\%
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
Learning-based methods have dominated the 3D human pose estimation (HPE)
tasks with significantly better performance in most benchmarks than traditional
optimization-based methods. Nonetheless, 3D HPE in the wild is still the
biggest challenge of learning-based models, whether with 2D-3D lifting,
image-to-3D, or diffusion-based methods, since the trained networks implicitly
learn camera intrinsic parameters and domain-based 3D human pose distributions
and estimate poses by statistical average. On the other hand, the
optimization-based methods estimate results case-by-case, which can predict
more diverse and sophisticated human poses in the wild. By combining the
advantages of optimization-based and learning-based methods, we propose the
Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the
problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO
achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE mm
without training with any 2D-3D or image-3D pairs. Moreover, our
single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE
mm on cross-dataset evaluation, which even outperforms learning-based
methods trained on 3DPW
- …