11,770 research outputs found
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Linear Global Translation Estimation with Feature Tracks
This paper derives a novel linear position constraint for cameras seeing a
common scene point, which leads to a direct linear method for global camera
translation estimation. Unlike previous solutions, this method deals with
collinear camera motion and weak image association at the same time. The final
linear formulation does not involve the coordinates of scene points, which
makes it efficient even for large scale data. We solve the linear equation
based on norm, which makes our system more robust to outliers in
essential matrices and feature correspondences. We experiment this method on
both sequentially captured images and unordered Internet images. The
experiments demonstrate its strength in robustness, accuracy, and efficiency.Comment: Changes: 1. Adopt BMVC2015 style; 2. Combine sections 3 and 5; 3.
Move "Evaluation on synthetic data" out to supplementary file; 4. Divide
subsection "Evaluation on general data" to subsections "Experiment on
sequential data" and "Experiment on unordered Internet data"; 5. Change Fig.
1 and Fig.8; 6. Move Fig. 6 and Fig. 7 to supplementary file; 7 Change some
symbols; 8. Correct some typo
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
- …