4 research outputs found
Relation-Based Associative Joint Location for Human Pose Estimation in Videos
Video-based human pose estimation (HPE) is a vital yet challenging task.
While deep learning methods have made significant progress for the HPE, most
approaches to this task detect each joint independently, damaging the pose
structural information. In this paper, unlike the prior methods, we propose a
Relation-based Pose Semantics Transfer Network (RPSTN) to locate joints
associatively. Specifically, we design a lightweight joint relation extractor
(JRE) to model the pose structural features and associatively generate heatmaps
for joints by modeling the relation between any two joints heuristically
instead of building each joint heatmap independently. Actually, the proposed
JRE module models the spatial configuration of human poses through the
relationship between any two joints. Moreover, considering the temporal
semantic continuity of videos, the pose semantic information in the current
frame is beneficial for guiding the location of joints in the next frame.
Therefore, we use the idea of knowledge reuse to propagate the pose semantic
information between consecutive frames. In this way, the proposed RPSTN
captures temporal dynamics of poses. On the one hand, the JRE module can infer
invisible joints according to the relationship between the invisible joints and
other visible joints in space. On the other hand, in the time, the propose
model can transfer the pose semantic features from the non-occluded frame to
the occluded frame to locate occluded joints. Therefore, our method is robust
to the occlusion and achieves state-of-the-art results on the two challenging
datasets, which demonstrates its effectiveness for video-based human pose
estimation. We will release the code and models publicly