UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

Chai, Wenhao; Hwang, Jenq-Neng; Jiang, Zhongyu; Li, Lei; Yang, Cheng-Yen; Zhou, Zhuoran

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

Authors: Wenhao Chai
Jenq-Neng Hwang
Zhongyu Jiang
Lei Li
Cheng-Yen Yang
Zhuoran Zhou
Publication date: 24 November 2023
Publisher

Abstract

In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE

50.5

mm on the Human3.6M dataset and PAMPJPE

51.6

mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.16477

Last time updated on 10/05/2024