9,156 research outputs found
xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera
We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. Our contribution is two-fold. Firstly, we propose a new encoder-decoder architecture with a novel dual branch decoder designed specifically to account for the varying uncertainty in the 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. Our second contribution is a new large-scale photorealistic synthetic dataset - xR-EgoPose - offering 383K frames of high quality renderings ofpeople with a diversity of skin tones, body shapes, clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint
xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera
We present a new solution to egocentric 3D body pose estimation from
monocular images captured from a downward looking fish-eye camera installed on
the rim of a head mounted virtual reality device. This unusual viewpoint, just
2 cm. away from the user's face, leads to images with unique visual appearance,
characterized by severe self-occlusions and strong perspective distortions that
result in a drastic difference in resolution between lower and upper body. Our
contribution is two-fold. Firstly, we propose a new encoder-decoder
architecture with a novel dual branch decoder designed specifically to account
for the varying uncertainty in the 2D joint locations. Our quantitative
evaluation, both on synthetic and real-world datasets, shows that our strategy
leads to substantial improvements in accuracy over state of the art egocentric
pose estimation approaches. Our second contribution is a new large-scale
photorealistic synthetic dataset - xR-EgoPose - offering 383K frames of high
quality renderings of people with a diversity of skin tones, body shapes,
clothing, in a variety of backgrounds and lighting conditions, performing a
range of actions. Our experiments show that the high variability in our new
synthetic training corpus leads to good generalization to real world footage
and to state of the art results on real world datasets with ground truth.
Moreover, an evaluation on the Human3.6M benchmark shows that the performance
of our method is on par with top performing approaches on the more classic
problem of 3D human pose from a third person viewpoint.Comment: ICCV 201
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e.
translation and rotation, of texture-less rigid objects. The dataset features
thirty industry-relevant objects with no significant texture and no
discriminative color or reflectance properties. The objects exhibit symmetries
and mutual similarities in shape and/or size. Compared to other datasets, a
unique property is that some of the objects are parts of others. The dataset
includes training and test images that were captured with three synchronized
sensors, specifically a structured-light and a time-of-flight RGB-D sensor and
a high-resolution RGB camera. There are approximately 39K training and 10K test
images from each sensor. Additionally, two types of 3D models are provided for
each object, i.e. a manually created CAD model and a semi-automatically
reconstructed one. Training images depict individual objects against a black
background. Test images originate from twenty test scenes having varying
complexity, which increases from simple scenes with several isolated objects to
very challenging ones with multiple instances of several objects and with a
high amount of clutter and occlusion. The images were captured from a
systematically sampled view sphere around the object/scene, and are annotated
with accurate ground truth 6D poses of all modeled objects. Initial evaluation
results indicate that the state of the art in 6D object pose estimation has
ample room for improvement, especially in difficult cases with significant
occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201
- …