24 research outputs found
A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds
This paper proposes a segmentation-free, automatic and efficient procedure to
detect general geometric quadric forms in point clouds, where clutter and
occlusions are inevitable. Our everyday world is dominated by man-made objects
which are designed using 3D primitives (such as planes, cones, spheres,
cylinders, etc.). These objects are also omnipresent in industrial
environments. This gives rise to the possibility of abstracting 3D scenes
through primitives, thereby positions these geometric forms as an integral part
of perception and high level 3D scene understanding.
As opposed to state-of-the-art, where a tailored algorithm treats each
primitive type separately, we propose to encapsulate all types in a single
robust detection procedure. At the center of our approach lies a closed form 3D
quadric fit, operating in both primal & dual spaces and requiring as low as 4
oriented-points. Around this fit, we design a novel, local null-space voting
strategy to reduce the 4-point case to 3. Voting is coupled with the famous
RANSAC and makes our algorithm orders of magnitude faster than its conventional
counterparts. This is the first method capable of performing a generic
cross-type multi-object primitive detection in difficult scenes. Results on
synthetic and real datasets support the validity of our method.Comment: Accepted for publication at CVPR 201
BlenderProc: Reducing the Reality Gap with Photorealistic Rendering
BlenderProc is an open-source and modular pipeline for rendering photorealistic images of procedurally generated 3D scenes which can be used for training data-hungry deep learning models. The presented results on the tasks of instance segmentation and surface normal estimation suggest that our photorealistic training images reduce the gap between the synthetic training and real test domains, compared to less realistic training images combined with domain randomization. BlenderProc can be used to train models for various computer vision tasks such as semantic segmentation or estimation of depth, optical flow, and object pose. By offering standard modules for parameterizing and sampling materials, objects, cameras and lights, BlenderProc can simulate various real-world scenarios and provide means to systematically investigate the essential factors for sim2real transfer
SC6D: Symmetry-agnostic and Correspondence-free 6D Object Pose Estimation
This paper presents an efficient symmetry-agnostic and correspondence-free
framework, referred to as SC6D, for 6D object pose estimation from a single
monocular RGB image. SC6D requires neither the 3D CAD model of the object nor
any prior knowledge of the symmetries. The pose estimation is decomposed into
three sub-tasks: a) object 3D rotation representation learning and matching; b)
estimation of the 2D location of the object center; and c) scale-invariant
distance estimation (the translation along the z-axis) via classification. SC6D
is evaluated on three benchmark datasets, T-LESS, YCB-V, and ITODD, and results
in state-of-the-art performance on the T-LESS dataset. Moreover, SC6D is
computationally much more efficient than the previous state-of-the-art method
SurfEmb. The implementation and pre-trained models are publicly available at
https://github.com/dingdingcai/SC6D-pose.Comment: 3DV 202
Polarimetric Pose Prediction
Light has many properties that vision sensors can passively measure.
Colour-band separated wavelength and intensity are arguably the most commonly
used for monocular 6D object pose estimation. This paper explores how
complementary polarisation information, i.e. the orientation of light wave
oscillations, influences the accuracy of pose predictions. A hybrid model that
leverages physical priors jointly with a data-driven learning strategy is
designed and carefully tested on objects with different levels of photometric
complexity. Our design significantly improves the pose accuracy compared to
state-of-the-art photometric approaches and enables object pose estimation for
highly reflective and transparent objects. A new multi-modal instance-level 6D
object pose dataset with highly accurate pose annotations for multiple objects
with varying photometric complexity is introduced as a benchmark.Comment: Accepted at ECCV 2022; 25 pages (14 main paper + References + 7
Appendix
CNOS: A Strong Baseline for CAD-based Novel Object Segmentation
We propose a simple three-stage approach to segment unseen objects in RGB
images using their CAD models. Leveraging recent powerful foundation models,
DINOv2 and Segment Anything, we create descriptors and generate proposals,
including binary masks for a given input RGB image. By matching proposals with
reference descriptors created from CAD models, we achieve precise object ID
assignment along with modal masks. We experimentally demonstrate that our
method achieves state-of-the-art results in CAD-based novel object
segmentation, surpassing existing approaches on the seven core datasets of the
BOP challenge by 19.8\% AP using the same BOP evaluation protocol. Our source
code is available at https://github.com/nv-nguyen/cnos
Unsupervised learning-based approach for detecting 3D edges in depth maps
3D edge features, which represent the boundaries between different objects or surfaces in a 3D scene, are crucial for many computer vision tasks, including object recognition, tracking, and segmentation. They also have numerous real-world applications in the field of robotics, such as vision-guided grasping and manipulation of objects. To extract these features in the noisy real-world depth data, reliable 3D edge detectors are indispensable. However, currently available 3D edge detection methods are either highly parameterized or require ground truth labelling, which makes them challenging to use for practical applications. To this extent, we present a new 3D edge detection approach using unsupervised classification. Our method learns features from depth maps at three different scales using an encoder-decoder network, from which edge-specific features are extracted. These edge features are then clustered using learning to classify each point as an edge or not. The proposed method has two key benefits. First, it eliminates the need for manual fine-tuning of data-specific hyper-parameters and automatically selects threshold values for edge classification. Second, the method does not require any labelled training data, unlike many state-of-the-art methods that require supervised training with extensive hand-labelled datasets. The proposed method is evaluated on five benchmark datasets with single and multi-object scenes, and compared with four state-of-the-art edge detection methods from the literature. Results demonstrate that the proposed method achieves competitive performance, despite not using any labelled data or relying on hand-tuning of key parameters.</p
6D Pose Estimation of Textureless Objects from a Single Camera
V této práci se věnuji vyhledávání objektů v prostoru na základě jediného RGB snímku a to jak pozice na všech třech osách tak i rotace kolem každé z nich za pomocí 3D modelů daných objektů. Uplatnění těchto metod je zejména v robotickém uchopování, autonomním řízení, nebo augmentované realitě. Skvělým zdrojem pro hledání vhodné metody je BOP Challenge, ve kterém jsou porovnávány nejlepší nové algoritmy na množině datasetů. Vybraný algoritmus pak budu přizpůsobovat a naučím jej na svém vlastním datasetu. Současné nejlepší metody pro 6D detekci objektů používají kombinaci klasifikátorů - například Cosypose používá 3 různé neuronové sítě a EPOS používá k predikci 6 kroků včetně vlastní neuronové sítě. Oba algoritmy mají dostupnou implementaci a skvělé výsledky v BOP. Pro ukázku funkčnosti si vyberu 4 objekty a jejich 3d modely a pomocí kamery se pokusím vytvořit základní dataset. Dále ale pokračuji technikou renderování fotorealistických obrázků, která je kvůli automatickému anotování objektů ve všech dimenzích mnohem rychlejší a praktičtější na velká množství dat nutná pro trénování neuronové sítě.This thesis focuses on estimating the pose of objects based on only one RGB image of the scene. This includes the position of the object on the three-axis as well as its rotation using 3D models of the objects. Usage of such methods is mainly in robotic grasping, autonomous driving or augmented reality. A great source for discovering these methods is the BOP Challenge, which is a competition trying to find the best state of the art public method by comparing them on a list of datasets. I will then modify the chosen algorithm and train it on my own dataset. The current state of the art methods use a combination of classifiers. For example, Cosypose uses three neural networks, and EPOS utilizes six steps, including a neural network for the prediction. Both motioned algorithms have publicly available implementation and great results in the BOP Challenge. For my proof of concept, I choose to use 4 objects with their respective 3D models, and I try to create a training dataset using an RGB camera. Then I switch to photorealistic rendering of the training images, which is a lot faster and more practical for the amount of training data a neural network requires, mainly because it allows for automatic annotation of the objects in the 6D space