Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.)
have achieved significant performance on pose estimation by combining feature
maps of various resolutions. In this paper, we propose a Resolution-wise
Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced
resolution-wise feature maps for precise pose estimation. Specifically, RAM
learns a group of weights to represent the different importance of feature maps
across resolutions, and the GPR gradually merges every two feature maps from
low to high resolutions to regress final human keypoint heatmaps. With the
enhanced resolution-wise features learnt by CNN, we obtain more accurate human
keypoint locations. The efficacies of our proposed methods are demonstrated on
MS-COCO dataset, achieving state-of-the-art performance with average precision
of 77.7 on COCO val2017 set and 77.0 on test-dev2017 set without using extra
human keypoint training dataset.Comment: Published on ICIP 202