24 research outputs found

    A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds

    Get PDF
    This paper proposes a segmentation-free, automatic and efficient procedure to detect general geometric quadric forms in point clouds, where clutter and occlusions are inevitable. Our everyday world is dominated by man-made objects which are designed using 3D primitives (such as planes, cones, spheres, cylinders, etc.). These objects are also omnipresent in industrial environments. This gives rise to the possibility of abstracting 3D scenes through primitives, thereby positions these geometric forms as an integral part of perception and high level 3D scene understanding. As opposed to state-of-the-art, where a tailored algorithm treats each primitive type separately, we propose to encapsulate all types in a single robust detection procedure. At the center of our approach lies a closed form 3D quadric fit, operating in both primal & dual spaces and requiring as low as 4 oriented-points. Around this fit, we design a novel, local null-space voting strategy to reduce the 4-point case to 3. Voting is coupled with the famous RANSAC and makes our algorithm orders of magnitude faster than its conventional counterparts. This is the first method capable of performing a generic cross-type multi-object primitive detection in difficult scenes. Results on synthetic and real datasets support the validity of our method.Comment: Accepted for publication at CVPR 201

    BlenderProc: Reducing the Reality Gap with Photorealistic Rendering

    Get PDF
    BlenderProc is an open-source and modular pipeline for rendering photorealistic images of procedurally generated 3D scenes which can be used for training data-hungry deep learning models. The presented results on the tasks of instance segmentation and surface normal estimation suggest that our photorealistic training images reduce the gap between the synthetic training and real test domains, compared to less realistic training images combined with domain randomization. BlenderProc can be used to train models for various computer vision tasks such as semantic segmentation or estimation of depth, optical flow, and object pose. By offering standard modules for parameterizing and sampling materials, objects, cameras and lights, BlenderProc can simulate various real-world scenarios and provide means to systematically investigate the essential factors for sim2real transfer

    SC6D: Symmetry-agnostic and Correspondence-free 6D Object Pose Estimation

    Get PDF
    This paper presents an efficient symmetry-agnostic and correspondence-free framework, referred to as SC6D, for 6D object pose estimation from a single monocular RGB image. SC6D requires neither the 3D CAD model of the object nor any prior knowledge of the symmetries. The pose estimation is decomposed into three sub-tasks: a) object 3D rotation representation learning and matching; b) estimation of the 2D location of the object center; and c) scale-invariant distance estimation (the translation along the z-axis) via classification. SC6D is evaluated on three benchmark datasets, T-LESS, YCB-V, and ITODD, and results in state-of-the-art performance on the T-LESS dataset. Moreover, SC6D is computationally much more efficient than the previous state-of-the-art method SurfEmb. The implementation and pre-trained models are publicly available at https://github.com/dingdingcai/SC6D-pose.Comment: 3DV 202

    Polarimetric Pose Prediction

    Full text link
    Light has many properties that vision sensors can passively measure. Colour-band separated wavelength and intensity are arguably the most commonly used for monocular 6D object pose estimation. This paper explores how complementary polarisation information, i.e. the orientation of light wave oscillations, influences the accuracy of pose predictions. A hybrid model that leverages physical priors jointly with a data-driven learning strategy is designed and carefully tested on objects with different levels of photometric complexity. Our design significantly improves the pose accuracy compared to state-of-the-art photometric approaches and enables object pose estimation for highly reflective and transparent objects. A new multi-modal instance-level 6D object pose dataset with highly accurate pose annotations for multiple objects with varying photometric complexity is introduced as a benchmark.Comment: Accepted at ECCV 2022; 25 pages (14 main paper + References + 7 Appendix

    CNOS: A Strong Baseline for CAD-based Novel Object Segmentation

    Full text link
    We propose a simple three-stage approach to segment unseen objects in RGB images using their CAD models. Leveraging recent powerful foundation models, DINOv2 and Segment Anything, we create descriptors and generate proposals, including binary masks for a given input RGB image. By matching proposals with reference descriptors created from CAD models, we achieve precise object ID assignment along with modal masks. We experimentally demonstrate that our method achieves state-of-the-art results in CAD-based novel object segmentation, surpassing existing approaches on the seven core datasets of the BOP challenge by 19.8\% AP using the same BOP evaluation protocol. Our source code is available at https://github.com/nv-nguyen/cnos

    Unsupervised learning-based approach for detecting 3D edges in depth maps

    Get PDF
    3D edge features, which represent the boundaries between different objects or surfaces in a 3D scene, are crucial for many computer vision tasks, including object recognition, tracking, and segmentation. They also have numerous real-world applications in the field of robotics, such as vision-guided grasping and manipulation of objects. To extract these features in the noisy real-world depth data, reliable 3D edge detectors are indispensable. However, currently available 3D edge detection methods are either highly parameterized or require ground truth labelling, which makes them challenging to use for practical applications. To this extent, we present a new 3D edge detection approach using unsupervised classification. Our method learns features from depth maps at three different scales using an encoder-decoder network, from which edge-specific features are extracted. These edge features are then clustered using learning to classify each point as an edge or not. The proposed method has two key benefits. First, it eliminates the need for manual fine-tuning of data-specific hyper-parameters and automatically selects threshold values for edge classification. Second, the method does not require any labelled training data, unlike many state-of-the-art methods that require supervised training with extensive hand-labelled datasets. The proposed method is evaluated on five benchmark datasets with single and multi-object scenes, and compared with four state-of-the-art edge detection methods from the literature. Results demonstrate that the proposed method achieves competitive performance, despite not using any labelled data or relying on hand-tuning of key parameters.</p

    6D Pose Estimation of Textureless Objects from a Single Camera

    Get PDF
    V této práci se věnuji vyhledávání objektů v prostoru na základě jediného RGB snímku a to jak pozice na všech třech osách tak i rotace kolem každé z nich za pomocí 3D modelů daných objektů. Uplatnění těchto metod je zejména v robotickém uchopování, autonomním řízení, nebo augmentované realitě. Skvělým zdrojem pro hledání vhodné metody je BOP Challenge, ve kterém jsou porovnávány nejlepší nové algoritmy na množině datasetů. Vybraný algoritmus pak budu přizpůsobovat a naučím jej na svém vlastním datasetu. Současné nejlepší metody pro 6D detekci objektů používají kombinaci klasifikátorů - například Cosypose používá 3 různé neuronové sítě a EPOS používá k predikci 6 kroků včetně vlastní neuronové sítě. Oba algoritmy mají dostupnou implementaci a skvělé výsledky v BOP. Pro ukázku funkčnosti si vyberu 4 objekty a jejich 3d modely a pomocí kamery se pokusím vytvořit základní dataset. Dále ale pokračuji technikou renderování fotorealistických obrázků, která je kvůli automatickému anotování objektů ve všech dimenzích mnohem rychlejší a praktičtější na velká množství dat nutná pro trénování neuronové sítě.This thesis focuses on estimating the pose of objects based on only one RGB image of the scene. This includes the position of the object on the three-axis as well as its rotation using 3D models of the objects. Usage of such methods is mainly in robotic grasping, autonomous driving or augmented reality. A great source for discovering these methods is the BOP Challenge, which is a competition trying to find the best state of the art public method by comparing them on a list of datasets. I will then modify the chosen algorithm and train it on my own dataset. The current state of the art methods use a combination of classifiers. For example, Cosypose uses three neural networks, and EPOS utilizes six steps, including a neural network for the prediction. Both motioned algorithms have publicly available implementation and great results in the BOP Challenge. For my proof of concept, I choose to use 4 objects with their respective 3D models, and I try to create a training dataset using an RGB camera. Then I switch to photorealistic rendering of the training images, which is a lot faster and more practical for the amount of training data a neural network requires, mainly because it allows for automatic annotation of the objects in the 6D space
    corecore