1,780 research outputs found
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e.
translation and rotation, of texture-less rigid objects. The dataset features
thirty industry-relevant objects with no significant texture and no
discriminative color or reflectance properties. The objects exhibit symmetries
and mutual similarities in shape and/or size. Compared to other datasets, a
unique property is that some of the objects are parts of others. The dataset
includes training and test images that were captured with three synchronized
sensors, specifically a structured-light and a time-of-flight RGB-D sensor and
a high-resolution RGB camera. There are approximately 39K training and 10K test
images from each sensor. Additionally, two types of 3D models are provided for
each object, i.e. a manually created CAD model and a semi-automatically
reconstructed one. Training images depict individual objects against a black
background. Test images originate from twenty test scenes having varying
complexity, which increases from simple scenes with several isolated objects to
very challenging ones with multiple instances of several objects and with a
high amount of clutter and occlusion. The images were captured from a
systematically sampled view sphere around the object/scene, and are annotated
with accurate ground truth 6D poses of all modeled objects. Initial evaluation
results indicate that the state of the art in 6D object pose estimation has
ample room for improvement, especially in difficult cases with significant
occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201
Human Detection by Fourier descriptors and Fuzzy Color Histograms with Fuzzy c-means method
It is difficult to use histograms of oriented gradients (HOG) or other gradient-based features to detect persons in outdoor environments given that the background or scale undergoes considerable changes. This study involved the segmentation of depth images. Additionally, P-type Fourier descriptors were extracted as shape features from two-dimensional coordinates of a contour in the segmentation domains. With respect to the P-type Fourier descriptors, a person detector was created with the fuzzy c-means method (for general person detection). Furthermore, a fuzzy color histogram was extracted in terms of color features from the RGB values of the domain surface. With respect to the fuzzy color histogram, a detector of a person wearing specific clothes was created with the fuzzy c-means method (specific person detection). The study includes the following characteristics: 1) The general person detection requires less number of images used for learning and is robust against a change in the scale when compared to that in cases in which HOG or other methods are used. 2) The specific person detection gives results close to those obtained by human color vision when compared to the color indices such as RGB or CIEDE. This method was applied for a person search application at the Tsukuba Challenge, and the obtained results confirmed the effectiveness of the proposed method.A part of the study was financially supported by Promotion Grant for Higher Education and Resech 2014 at Kansai University under the title "Tsukuba Challenge and RoboCup @ Home."平成26年度関西大学教育研究高度化促進
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios
Estimating the 6D pose of objects is a major 3D computer vision problem.
Since the promising outcomes from instance-level approaches, research heads
also move towards category-level pose estimation for more practical application
scenarios. However, unlike well-established instance-level pose datasets,
available category-level datasets lack annotation quality and provided pose
quantity. We propose the new category-level 6D pose dataset HouseCat6D
featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly
diverse 194 objects of 10 household object categories including 2
photometrically challenging categories, 3) High-quality pose annotation with an
error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive
viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout
the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps.
Furthermore, we also provide benchmark results of state-of-the-art
category-level pose estimation networks
- …