322 research outputs found
Rule Of Thumb: Deep derotation for improved fingertip detection
We investigate a novel global orientation regression approach for articulated
objects using a deep convolutional neural network. This is integrated with an
in-plane image derotation scheme, DeROT, to tackle the problem of per-frame
fingertip detection in depth images. The method reduces the complexity of
learning in the space of articulated poses which is demonstrated by using two
distinct state-of-the-art learning based hand pose estimation methods applied
to fingertip detection. Significant classification improvements are shown over
the baseline implementation. Our framework involves no tracking, kinematic
constraints or explicit prior model of the articulated object in hand. To
support our approach we also describe a new pipeline for high accuracy magnetic
annotation and labeling of objects imaged by a depth camera.Comment: To be published in proceedings of BMVC 201
Fast and Robust Hand Tracking Using Detection-Guided Optimization
Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because of tracking inaccuracies, incomplete coverage of motions, low framerate, complex camera setups, and high computational requirements. In this paper, we present a fast method for accurately tracking rapid and complex articulations of the hand using a single depth camera. Our algorithm uses a novel detection-guided optimization strategy that increases the robustness and speed of pose estimation. In the detection step, a randomized decision forest classifies pixels into parts of the hand. In the optimization step, a novel objective function combines the detected part labels and a Gaussian mixture representation of the depth to estimate a pose that best fits the depth. Our approach needs comparably less computational resources which makes it extremely fast (50 fps without GPU support). The approach also supports varying static, or moving, camera-to-scene arrangements. We show the benefits of our method by evaluating on public datasets and comparing against previous work
Markerless Motion Capture via Convolutional Neural Network
A human motion capture system can be defined as a process that digitally records the movements of a person and then translates them into computer-animated images.
To achieve this goal, motion capture systems usually exploit different types of algorithms, which include techniques such as pose estimation or background subtraction: this latter aims at segmenting moving objects from the background under multiple challenging scenarios. Recently, encoder-decoder-type deep neural networks designed to accomplish this task have reached impressive results, outperforming classical approaches.
The aim of this thesis is to evaluate and discuss the predictions provided by the multi-scale convolutional neural network FgSegNet_v2, a deep learning-based method which represents the current state-of-the-art for implementing scene-specific background subtraction.
In this work, FgSegNet_v2 is trained and tested on BBSoF S.r.l. dataset, extending its scene- specific use to a more general application in several environments
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation
Hand motion capture is a popular research field, recently gaining more
attention due to the ubiquity of RGB-D sensors. However, even most recent
approaches focus on the case of a single isolated hand. In this work, we focus
on hands that interact with other hands or objects and present a framework that
successfully captures motion in such interaction scenarios for both rigid and
articulated objects. Our framework combines a generative model with
discriminatively trained salient points to achieve a low tracking error and
with collision detection and physics simulation to achieve physically plausible
estimates even in case of occlusions and missing visual data. Since all
components are unified in a single objective function which is almost
everywhere differentiable, it can be optimized with standard optimization
techniques. Our approach works for monocular RGB-D sequences as well as setups
with multiple synchronized RGB cameras. For a qualitative and quantitative
evaluation, we captured 29 sequences with a large variety of interactions and
up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer
Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a
single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14
hand tracking paper with several extensions, additional experiments and
detail
Parallelization Strategies for Markerless Human Motion Capture
Markerless Motion Capture (MMOCAP) is the
problem of determining the pose of a person from images
captured by one or several cameras simultaneously without
using markers on the subject. Evaluation of the solutions
is frequently the most time-consuming task, making most
of the proposed methods inapplicable in real-time scenarios.
This paper presents an efficient approach to parallelize
the evaluation of the solutions in CPUs and GPUs. Our proposal
is experimentally compared on six sequences of the
HumanEva-I dataset using the CMAES algorithm. Multiple
algorithm’s configurations were tested to analyze the
best trade-off in regard to the accuracy and computing time.
The proposed methods obtain speedups of 8Ă— in multi-core
CPUs, 30Ă— in a single GPU and up to 110Ă— using 4 GPU
Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties
Model-based approaches to 3D hand tracking have been shown to perform well in
a wide range of scenarios. However, they require initialisation and cannot
recover easily from tracking failures that occur due to fast hand motions.
Data-driven approaches, on the other hand, can quickly deliver a solution, but
the results often suffer from lower accuracy or missing anatomical validity
compared to those obtained from model-based approaches. In this work we propose
a hybrid approach for hand pose estimation from a single depth image. First, a
learned regressor is employed to deliver multiple initial hypotheses for the 3D
position of each hand joint. Subsequently, the kinematic parameters of a 3D
hand model are found by deliberately exploiting the inherent uncertainty of the
inferred joint proposals. This way, the method provides anatomically valid and
accurate solutions without requiring manual initialisation or suffering from
track losses. Quantitative results on several standard datasets demonstrate
that the proposed method outperforms state-of-the-art representatives of the
model-based, data-driven and hybrid paradigms.Comment: BMVC 2015 (oral); see also
http://lrs.icg.tugraz.at/research/hybridhape
- …