1,780 research outputs found

    T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

    Full text link
    We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201

    Human Detection by Fourier descriptors and Fuzzy Color Histograms with Fuzzy c-means method

    Get PDF
    It is difficult to use histograms of oriented gradients (HOG) or other gradient-based features to detect persons in outdoor environments given that the background or scale undergoes considerable changes. This study involved the segmentation of depth images. Additionally, P-type Fourier descriptors were extracted as shape features from two-dimensional coordinates of a contour in the segmentation domains. With respect to the P-type Fourier descriptors, a person detector was created with the fuzzy c-means method (for general person detection). Furthermore, a fuzzy color histogram was extracted in terms of color features from the RGB values of the domain surface. With respect to the fuzzy color histogram, a detector of a person wearing specific clothes was created with the fuzzy c-means method (specific person detection). The study includes the following characteristics: 1) The general person detection requires less number of images used for learning and is robust against a change in the scale when compared to that in cases in which HOG or other methods are used. 2) The specific person detection gives results close to those obtained by human color vision when compared to the color indices such as RGB or CIEDE. This method was applied for a person search application at the Tsukuba Challenge, and the obtained results confirmed the effectiveness of the proposed method.A part of the study was financially supported by Promotion Grant for Higher Education and Resech 2014 at Kansai University under the title "Tsukuba Challenge and RoboCup @ Home."平成26年度関西大学教育研究高度化促進

    Semantic Visual Localization

    Full text link
    Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes

    HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

    Full text link
    Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks

    A perception pipeline exploiting trademark databases for service robots

    Get PDF
    corecore