1,962 research outputs found
3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance Matching
We present a novel appearance-based approach for pose estimation of a human
hand using the point clouds provided by the low-cost Microsoft Kinect sensor.
Both the free-hand case, in which the hand is isolated from the surrounding
environment, and the hand-object case, in which the different types of
interactions are classified, have been considered. The hand-object case is
clearly the most challenging task having to deal with multiple tracks. The
approach proposed here belongs to the class of partial pose estimation where
the estimated pose in a frame is used for the initialization of the next one.
The pose estimation is obtained by applying a modified version of the Iterative
Closest Point (ICP) algorithm to synthetic models to obtain the rigid
transformation that aligns each model with respect to the input data. The
proposed framework uses a "pure" point cloud as provided by the Kinect sensor
without any other information such as RGB values or normal vector components.
For this reason, the proposed method can also be applied to data obtained from
other types of depth sensor, or RGB-D camera
Non-rigid Reconstruction with a Single Moving RGB-D Camera
We present a novel non-rigid reconstruction method using a moving RGB-D
camera. Current approaches use only non-rigid part of the scene and completely
ignore the rigid background. Non-rigid parts often lack sufficient geometric
and photometric information for tracking large frame-to-frame motion. Our
approach uses camera pose estimated from the rigid background for foreground
tracking. This enables robust foreground tracking in situations where large
frame-to-frame motion occurs. Moreover, we are proposing a multi-scale
deformation graph which improves non-rigid tracking without compromising the
quality of the reconstruction. We are also contributing a synthetic dataset
which is made publically available for evaluating non-rigid reconstruction
methods. The dataset provides frame-by-frame ground truth geometry of the
scene, the camera trajectory, and masks for background foreground. Experimental
results show that our approach is more robust in handling larger frame-to-frame
motions and provides better reconstruction compared to state-of-the-art
approaches.Comment: Accepted in International Conference on Pattern Recognition (ICPR
2018
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction
This paper presents a method which can track and 3D reconstruct the non-rigid
surface motion of human performance using a moving RGB-D camera. 3D
reconstruction of marker-less human performance is a challenging problem due to
the large range of articulated motions and considerable non-rigid deformations.
Current approaches use local optimization for tracking. These methods need many
iterations to converge and may get stuck in local minima during sudden
articulated movements. We propose a puppet model-based tracking approach using
skeleton prior, which provides a better initialization for tracking articulated
movements. The proposed approach uses an aligned puppet model to estimate
correct correspondences for human performance capture. We also contribute a
synthetic dataset which provides ground truth locations for frame-by-frame
geometry and skeleton joints of human subjects. Experimental results show that
our approach is more robust when faced with sudden articulated motions, and
provides better 3D reconstruction compared to the existing state-of-the-art
approaches.Comment: Accepted in DICTA 201
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Most of the existing deep learning-based methods for 3D hand and human pose
estimation from a single depth map are based on a common framework that takes a
2D depth map and directly regresses the 3D coordinates of keypoints, such as
hand or human body joints, via 2D convolutional neural networks (CNNs). The
first weakness of this approach is the presence of perspective distortion in
the 2D depth map. While the depth map is intrinsically 3D data, many previous
methods treat depth maps as 2D images that can distort the shape of the actual
object through projection from 3D to 2D space. This compels the network to
perform perspective distortion-invariant estimation. The second weakness of the
conventional approach is that directly regressing 3D coordinates from a 2D
image is a highly non-linear mapping, which causes difficulty in the learning
procedure. To overcome these weaknesses, we firstly cast the 3D hand and human
pose estimation problem from a single depth map into a voxel-to-voxel
prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood
for each keypoint. We design our model as a 3D CNN that provides accurate
estimates while running in real-time. Our system outperforms previous methods
in almost all publicly available 3D hand and human pose estimation datasets and
placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV
2017), Published at CVPR 201
- …