13,916 research outputs found
Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties
Model-based approaches to 3D hand tracking have been shown to perform well in
a wide range of scenarios. However, they require initialisation and cannot
recover easily from tracking failures that occur due to fast hand motions.
Data-driven approaches, on the other hand, can quickly deliver a solution, but
the results often suffer from lower accuracy or missing anatomical validity
compared to those obtained from model-based approaches. In this work we propose
a hybrid approach for hand pose estimation from a single depth image. First, a
learned regressor is employed to deliver multiple initial hypotheses for the 3D
position of each hand joint. Subsequently, the kinematic parameters of a 3D
hand model are found by deliberately exploiting the inherent uncertainty of the
inferred joint proposals. This way, the method provides anatomically valid and
accurate solutions without requiring manual initialisation or suffering from
track losses. Quantitative results on several standard datasets demonstrate
that the proposed method outperforms state-of-the-art representatives of the
model-based, data-driven and hybrid paradigms.Comment: BMVC 2015 (oral); see also
http://lrs.icg.tugraz.at/research/hybridhape
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposal
algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN
have reduced the running time of these detection networks, exposing region
proposal computation as a bottleneck. In this work, we introduce a Region
Proposal Network (RPN) that shares full-image convolutional features with the
detection network, thus enabling nearly cost-free region proposals. An RPN is a
fully convolutional network that simultaneously predicts object bounds and
objectness scores at each position. The RPN is trained end-to-end to generate
high-quality region proposals, which are used by Fast R-CNN for detection. We
further merge RPN and Fast R-CNN into a single network by sharing their
convolutional features---using the recently popular terminology of neural
networks with 'attention' mechanisms, the RPN component tells the unified
network where to look. For the very deep VGG-16 model, our detection system has
a frame rate of 5fps (including all steps) on a GPU, while achieving
state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS
COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015
competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning
entries in several tracks. Code has been made publicly available.Comment: Extended tech repor
S-OHEM: Stratified Online Hard Example Mining for Object Detection
One of the major challenges in object detection is to propose detectors with
highly accurate localization of objects. The online sampling of high-loss
region proposals (hard examples) uses the multitask loss with equal weight
settings across all loss types (e.g, classification and localization, rigid and
non-rigid categories) and ignores the influence of different loss distributions
throughout the training process, which we find essential to the training
efficacy. In this paper, we present the Stratified Online Hard Example Mining
(S-OHEM) algorithm for training higher efficiency and accuracy detectors.
S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling
technique, to choose the training examples according to this influence during
hard example mining, and thus enhance the performance of object detectors. We
show through systematic experiments that S-OHEM yields an average precision
(AP) improvement of 0.5% on rigid categories of PASCAL VOC 2007 for both the
IoU threshold of 0.6 and 0.7. For KITTI 2012, both results of the same metric
are 1.6%. Regarding the mean average precision (mAP), a relative increase of
0.3% and 0.5% (1% and 0.5%) is observed for VOC07 (KITTI12) using the same set
of IoU threshold. Also, S-OHEM is easy to integrate with existing region-based
detectors and is capable of acting with post-recognition level regressors.Comment: 9 pages, 3 figures, accepted by CCCV 201
Deformable Part-based Fully Convolutional Network for Object Detection
Existing region-based object detectors are limited to regions with fixed box
geometry to represent objects, even if those are highly non-rectangular. In
this paper we introduce DP-FCN, a deep model for object detection which
explicitly adapts to shapes of objects with deformable parts. Without
additional annotations, it learns to focus on discriminative elements and to
align them, and simultaneously brings more invariance for classification and
geometric information to refine localization. DP-FCN is composed of three main
modules: a Fully Convolutional Network to efficiently maintain spatial
resolution, a deformable part-based RoI pooling layer to optimize positions of
parts and build invariance, and a deformation-aware localization module
explicitly exploiting displacements of parts to improve accuracy of bounding
box regression. We experimentally validate our model and show significant
gains. DP-FCN achieves state-of-the-art performances of 83.1% and 80.9% on
PASCAL VOC 2007 and 2012 with VOC data only.Comment: Accepted to BMVC 2017 (oral
- …