9,887 research outputs found
Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images
Current CNN-based algorithms for recovering the 3D pose of an object in an
image assume knowledge about both the object category and its 2D localization
in the image. In this paper, we relax one of these constraints and propose to
solve the task of joint object category and 3D pose estimation from an image
assuming known 2D localization. We design a new architecture for this task
composed of a feature network that is shared between subtasks, an object
categorization network built on top of the feature network, and a collection of
category dependent pose regression networks. We also introduce suitable loss
functions and a training method for the new architecture. Experiments on the
challenging PASCAL3D+ dataset show state-of-the-art performance in the joint
categorization and pose estimation task. Moreover, our performance on the joint
task is comparable to the performance of state-of-the-art methods on the
simpler 3D pose estimation with known object category task
Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn
This paper presents an image classification based approach for skeleton-based
video action recognition problem. Firstly, A dataset independent
translation-scale invariant image mapping method is proposed, which transformes
the skeleton videos to colour images, named skeleton-images. Secondly, A
multi-scale deep convolutional neural network (CNN) architecture is proposed
which could be built and fine-tuned on the powerful pre-trained CNNs, e.g.,
AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very
different from natural images, the fine-tune strategy still works well. At
last, we prove that our method could also work well on 2D skeleton video data.
We achieve the state-of-the-art results on the popular benchmard datasets e.g.
NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge
NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods
by a large margion, which proves the efficacy of the proposed method
CNN Based Posture-Free Hand Detection
Although many studies suggest high performance hand detection methods, those
methods are likely to be overfitting. Fortunately, the Convolution Neural
Network (CNN) based approach provides a better way that is less sensitive to
translation and hand poses. However the CNN approach is complex and can
increase computational time, which at the end reduce its effectiveness on a
system where the speed is essential.In this study we propose a shallow CNN
network which is fast, and insensitive to translation and hand poses. It is
tested on two different domains of hand datasets, and performs in relatively
comparable performance and faster than the other state-of-the-art hand
CNN-based hand detection method. Our evaluation shows that the proposed shallow
CNN network performs at 93.9% accuracy and reaches much faster speed than its
competitors.Comment: 4 pages, 5 figures, in The 10th International Conference on
Information Technology and Electrical Engineering 2018, ISBN:
978-1-5386-4739-
GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB
We address the highly challenging problem of real-time 3D hand tracking based
on a monocular RGB-only sequence. Our tracking method combines a convolutional
neural network with a kinematic 3D hand model, such that it generalizes well to
unseen data, is robust to occlusions and varying camera viewpoints, and leads
to anatomically plausible as well as temporally smooth hand motions. For
training our CNN we propose a novel approach for the synthetic generation of
training data that is based on a geometrically consistent image-to-image
translation network. To be more specific, we use a neural network that
translates synthetic images to "real" images, such that the so-generated images
follow the same statistical distribution as real-world hand images. For
training this translation network we combine an adversarial loss and a
cycle-consistency loss with a geometric consistency loss in order to preserve
geometric properties (such as hand pose) during translation. We demonstrate
that our hand tracking system outperforms the current state-of-the-art on
challenging RGB-only footage
- …