2,487 research outputs found
GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB
We address the highly challenging problem of real-time 3D hand tracking based
on a monocular RGB-only sequence. Our tracking method combines a convolutional
neural network with a kinematic 3D hand model, such that it generalizes well to
unseen data, is robust to occlusions and varying camera viewpoints, and leads
to anatomically plausible as well as temporally smooth hand motions. For
training our CNN we propose a novel approach for the synthetic generation of
training data that is based on a geometrically consistent image-to-image
translation network. To be more specific, we use a neural network that
translates synthetic images to "real" images, such that the so-generated images
follow the same statistical distribution as real-world hand images. For
training this translation network we combine an adversarial loss and a
cycle-consistency loss with a geometric consistency loss in order to preserve
geometric properties (such as hand pose) during translation. We demonstrate
that our hand tracking system outperforms the current state-of-the-art on
challenging RGB-only footage
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
DIY Human Action Data Set Generation
The recent successes in applying deep learning techniques to solve standard
computer vision problems has aspired researchers to propose new computer vision
problems in different domains. As previously established in the field, training
data itself plays a significant role in the machine learning process,
especially deep learning approaches which are data hungry. In order to solve
each new problem and get a decent performance, a large amount of data needs to
be captured which may in many cases pose logistical difficulties. Therefore,
the ability to generate de novo data or expand an existing data set, however
small, in order to satisfy data requirement of current networks may be
invaluable. Herein, we introduce a novel way to partition an action video clip
into action, subject and context. Each part is manipulated separately and
reassembled with our proposed video generation technique. Furthermore, our
novel human skeleton trajectory generation along with our proposed video
generation technique, enables us to generate unlimited action recognition
training data. These techniques enables us to generate video action clips from
an small set without costly and time-consuming data acquisition. Lastly, we
prove through extensive set of experiments on two small human action
recognition data sets, that this new data generation technique can improve the
performance of current action recognition neural nets
- …