3 research outputs found
Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation
Hand pose estimation is more challenging than body pose estimation due to
severe articulation, self-occlusion and high dexterity of the hand. Current
approaches often rely on a popular body pose algorithm, such as the
Convolutional Pose Machine (CPM), to learn 2D keypoint features. These
algorithms cannot adequately address the unique challenges of hand pose
estimation, because they are trained solely based on keypoint positions without
seeking to explicitly model structural relationship between them. We propose a
novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose
estimation, adopting a cascade multi-task architecture to learn hand structure
and keypoint representations jointly. The structure learning is guided by
synthetic hand mask representations, which are directly computed from keypoint
positions, and is further strengthened by a novel probabilistic representation
of hand limbs and an anatomically inspired composition strategy of mask
synthesis. We conduct extensive studies on two public datasets - OneHand 10k
and CMU Panoptic Hand. Experimental results demonstrate that explicitly
enforcing structure learning consistently improves pose estimation accuracy of
CPM baseline models, by 1.17% on the first dataset and 4.01% on the second one.
The implementation and experiment code is freely available online. Our proposal
of incorporating structural learning to hand pose estimation requires no
additional training information, and can be a generic add-on module to other
pose estimation models.Comment: The paper has be accepted and will be presented at 2020 IEEE Winter
Conference on Applications of Computer Vision (WACV). The code is freely
available at https://github.com/HowieMa/NSRMhan
Fast Monocular Hand Pose Estimation on Embedded Systems
Hand pose estimation is a fundamental task in many human-robot
interaction-related applications. However, previous approaches suffer from
unsatisfying hand landmark predictions in real-world scenes and high
computation burden. This paper proposes a fast and accurate framework for hand
pose estimation, dubbed as "FastHand". Using a lightweight encoder-decoder
network architecture, FastHand fulfills the requirements of practical
applications running on embedded devices. The encoder consists of deep layers
with a small number of parameters, while the decoder makes use of spatial
location information to obtain more accurate results. The evaluation took place
on two publicly available datasets demonstrating the improved performance of
the proposed pipeline compared to other state-of-the-art approaches. FastHand
offers high accuracy scores while reaching a speed of 25 frames per second on
an NVIDIA Jetson TX2 graphics processing unit
SIA-GCN: A Spatial Information Aware Graph Neural Network with 2D Convolutions for Hand Pose Estimation
Graph Neural Networks (GNNs) generalize neural networks from applications on
regular structures to applications on arbitrary graphs, and have shown success
in many application domains such as computer vision, social networks and
chemistry. In this paper, we extend GNNs along two directions: a) allowing
features at each node to be represented by 2D spatial confidence maps instead
of 1D vectors; and b) proposing an efficient operation to integrate information
from neighboring nodes through 2D convolutions with different learnable kernels
at each edge. The proposed SIA-GCN can efficiently extract spatial information
from 2D maps at each node and propagate them through graph convolution. By
associating each edge with a designated convolution kernel, the SIA-GCN could
capture different spatial relationships for different pairs of neighboring
nodes. We demonstrate the utility of SIA-GCN on the task of estimating hand
keypoints from single-frame images, where the nodes represent the 2D coordinate
heatmaps of keypoints and the edges denote the kinetic relationships between
keypoints. Experiments on multiple datasets show that SIA-GCN provides a
flexible and yet powerful framework to account for structural constraints
between keypoints, and can achieve state-of-the-art performance on the task of
hand pose estimation.Comment: 31st British Machine Vision Conference (BMVC), oral presentatio