2 research outputs found
Hybrid model for Single-Stage Multi-Person Pose Estimation
In general, human pose estimation methods are categorized into two approaches
according to their architectures: regression (i.e., heatmap-free) and
heatmap-based methods. The former one directly estimates precise coordinates of
each keypoint using convolutional and fully-connected layers. Although this
approach is able to detect overlapped and dense keypoints, unexpected results
can be obtained by non-existent keypoints in a scene. On the other hand, the
latter one is able to filter the non-existent ones out by utilizing predicted
heatmaps for each keypoint. Nevertheless, it suffers from quantization error
when obtaining the keypoint coordinates from its heatmaps. In addition, unlike
the regression one, it is difficult to distinguish densely placed keypoints in
an image. To this end, we propose a hybrid model for single-stage multi-person
pose estimation, named HybridPose, which mutually overcomes each drawback of
both approaches by maximizing their strengths. Furthermore, we introduce
self-correlation loss to inject spatial dependencies between keypoint
coordinates and their visibility. Therefore, HybridPose is capable of not only
detecting densely placed keypoints, but also filtering the non-existent
keypoints in an image. Experimental results demonstrate that proposed
HybridPose exhibits the keypoints visibility without performance degradation in
terms of the pose estimation accuracy
Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation
In general, hand pose estimation aims to improve the robustness of model
performance in the real-world scenes. However, it is difficult to enhance the
robustness since existing datasets are obtained in restricted environments to
annotate 3D information. Although neural networks quantitatively achieve a high
estimation accuracy, unsatisfied results can be observed in visual quality.
This discrepancy between quantitative results and their visual qualities
remains an open issue in the hand pose representation. To this end, we propose
a mesh represented recycle learning strategy for 3D hand pose and mesh
estimation which reinforces synthesized hand mesh representation in a training
phase. To be specific, a hand pose and mesh estimation model first predicts
parametric 3D hand annotations (i.e., 3D keypoint positions and vertices for
hand mesh) with real-world hand images in the training phase. Second, synthetic
hand images are generated with self-estimated hand mesh representations. After
that, the synthetic hand images are fed into the same model again. Thus, the
proposed learning strategy simultaneously improves quantitative results and
visual qualities by reinforcing synthetic mesh representation. To encourage
consistency between original model output and its recycled one, we propose
self-correlation loss which maximizes the accuracy and reliability of our
learning strategy. Consequently, the model effectively conducts self-refinement
on hand pose estimation by learning mesh representation from its own output. To
demonstrate the effectiveness of our learning strategy, we provide extensive
experiments on FreiHAND dataset. Notably, our learning strategy improves the
performance on hand pose and mesh estimation without any extra computational
burden during the inference