3,063 research outputs found
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
A fast algorithm for vision-based hand gesture recognition for robot control
We propose a fast algorithm for automatically recognizing a limited set of gestures from hand images for a robot control application. Hand gesture recognition is a challenging problem in its general form. We consider a fixed set of manual commands and a reasonably structured environment, and develop a simple, yet effective, procedure for gesture recognition. Our approach contains steps for segmenting the hand region, locating the fingers, and finally classifying the gesture. The algorithm is invariant to translation, rotation, and scale of the hand. We demonstrate the effectiveness of the technique on real imagery
Bidirectional Conditional Generative Adversarial Networks
Conditional Generative Adversarial Networks (cGANs) are generative models
that can produce data samples () conditioned on both latent variables ()
and known auxiliary information (). We propose the Bidirectional cGAN
(BiCoGAN), which effectively disentangles and in the generation process
and provides an encoder that learns inverse mappings from to both and
, trained jointly with the generator and the discriminator. We present
crucial techniques for training BiCoGANs, which involve an extrinsic factor
loss along with an associated dynamically-tuned importance weight. As compared
to other encoder-based cGANs, BiCoGANs encode more accurately, and utilize
and more effectively and in a more disentangled way to generate
samples.Comment: To appear in Proceedings of ACCV 201
Human Perambulation as a Self Calibrating Biometric
This paper introduces a novel method of single camera gait reconstruction which is independent of the walking direction and of the camera parameters. Recognizing people by gait has unique advantages with respect to other biometric techniques: the identification of the walking subject is completely unobtrusive and the identification can be achieved at distance. Recently much research has been conducted into the recognition of frontoparallel gait. The proposed method relies on the very nature of walking to achieve the independence from walking direction. Three major assumptions have been done: human gait is cyclic; the distances between the bone joints are invariant during the execution of the movement; and the articulated leg motion is approximately planar, since almost all of the perceived motion is contained within a single limb swing plane. The method has been tested on several subjects walking freely along six different directions in a small enclosed area. The results show that recognition can be achieved without calibration and without dependence on view direction. The obtained results are particularly encouraging for future system development and for its application in real surveillance scenarios
Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images
Current CNN-based algorithms for recovering the 3D pose of an object in an
image assume knowledge about both the object category and its 2D localization
in the image. In this paper, we relax one of these constraints and propose to
solve the task of joint object category and 3D pose estimation from an image
assuming known 2D localization. We design a new architecture for this task
composed of a feature network that is shared between subtasks, an object
categorization network built on top of the feature network, and a collection of
category dependent pose regression networks. We also introduce suitable loss
functions and a training method for the new architecture. Experiments on the
challenging PASCAL3D+ dataset show state-of-the-art performance in the joint
categorization and pose estimation task. Moreover, our performance on the joint
task is comparable to the performance of state-of-the-art methods on the
simpler 3D pose estimation with known object category task
Everybody Dance Now
This paper presents a simple method for "do as I do" motion transfer: given a
source video of a person dancing, we can transfer that performance to a novel
(amateur) target after only a few minutes of the target subject performing
standard moves. We approach this problem as video-to-video translation using
pose as an intermediate representation. To transfer the motion, we extract
poses from the source subject and apply the learned pose-to-appearance mapping
to generate the target subject. We predict two consecutive frames for
temporally coherent video results and introduce a separate pipeline for
realistic face synthesis. Although our method is quite simple, it produces
surprisingly compelling results (see video). This motivates us to also provide
a forensics tool for reliable synthetic content detection, which is able to
distinguish videos synthesized by our system from real data. In addition, we
release a first-of-its-kind open-source dataset of videos that can be legally
used for training and motion transfer.Comment: In ICCV 201
- …