3 research outputs found
Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images
Fingertip detection plays an important role in human computer interaction.
Previous works transform binocular images into depth images. Then depth-based
hand pose estimation methods are used to predict 3D positions of fingertips.
Different from previous works, we propose a new framework, named Two-Stream
Binocular Network (TSBnet) to detect fingertips from binocular images directly.
TSBnet first shares convolutional layers for low level features of right and
left images. Then it extracts high level features in two-stream convolutional
networks separately. Further, we add a new layer: binocular distance
measurement layer to improve performance of our model. To verify our scheme, we
build a binocular hand image dataset, containing about 117k pairs of images in
training set and 10k pairs of images in test set. Our methods achieve an
average error of 10.9mm on our test set, outperforming previous work by 5.9mm
(relatively 35.1%).Comment: Published in: Visual Communications and Image Processing (VCIP), 2017
IEEE. Original IEEE publication available on
https://ieeexplore.ieee.org/abstract/document/8305146/. Dataset available on
https://sites.google.com/view/thuhand1
Towards Good Practices for Deep 3D Hand Pose Estimation
3D hand pose estimation from single depth image is an important and
challenging problem for human-computer interaction. Recently deep convolutional
networks (ConvNet) with sophisticated design have been employed to address it,
but the improvement over traditional random forest based methods is not so
apparent. To exploit the good practice and promote the performance for hand
pose estimation, we propose a tree-structured Region Ensemble Network (REN) for
directly 3D coordinate regression. It first partitions the last convolution
outputs of ConvNet into several grid regions. The results from separate
fully-connected (FC) regressors on each regions are then integrated by another
FC layer to perform the estimation. By exploitation of several training
strategies including data augmentation and smooth loss, proposed REN can
significantly improve the performance of ConvNet to localize hand joints. The
experimental results demonstrate that our approach achieves the best
performance among state-of-the-art algorithms on three public hand pose
datasets. We also experiment our methods on fingertip detection and human pose
datasets and obtain state-of-the-art accuracy.Comment: Extended version of arXiv:1702.0244
Bi-stream Pose Guided Region Ensemble Network for Fingertip Localization from Stereo Images
In human-computer interaction, it is important to accurately estimate the
hand pose especially fingertips. However, traditional approaches for fingertip
localization mainly rely on depth images and thus suffer considerably from the
noise and missing values. Instead of depth images, stereo images can also
provide 3D information of hands and promote 3D hand pose estimation. There are
nevertheless limitations on the dataset size, global viewpoints, hand
articulations and hand shapes in the publicly available stereo-based hand pose
datasets. To mitigate these limitations and promote further research on hand
pose estimation from stereo images, we propose a new large-scale binocular hand
pose dataset called THU-Bi-Hand, offering a new perspective for fingertip
localization. In the THU-Bi-Hand dataset, there are 447k pairs of stereo images
of different hand shapes from 10 subjects with accurate 3D location annotations
of the wrist and five fingertips. Captured with minimal restriction on the
range of hand motion, the dataset covers large global viewpoint space and hand
articulation space. To better present the performance of fingertip localization
on THU-Bi-Hand, we propose a novel scheme termed Bi-stream Pose Guided Region
Ensemble Network (Bi-Pose-REN). It extracts more representative feature regions
around joint points in the feature maps under the guidance of the previously
estimated pose. The feature regions are integrated hierarchically according to
the topology of hand joints to regress the refined hand pose. Bi-Pose-REN and
several existing methods are evaluated on THU-Bi-Hand so that benchmarks are
provided for further research. Experimental results show that our new method
has achieved the best performance on THU-Bi-Hand.Comment: Cairong Zhang and Xinghao Chen are equally contribute