Search CORE

7 research outputs found

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

Author: BT Phong
P Vincent
S Hinterstoisser
S Hinterstoisser
S Hinterstoisser
SR Richter
T Hodaň
T-Y Lin
W Kehl
W Liu
Y Movshovitz-Attias
Z Zhang
Publication venue
Publication date: 10/09/2018
Field of study

We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Recovering 6D Object Pose: A Review and Multi-modal Analysis

Author: A Tejani
C Sahin
D Hoiem
E Brachmann
H Azizpour
M Everingham
M Everingham
MY Liu
N Correll
O Russakovsky
S Hinterstoisser
S Hinterstoisser
T Hodaň
U Bonde
W Kehl
Publication venue
Publication date: 15/08/2018
Field of study

A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem

arXiv.org e-Print Archive

Crossref

Relative Pose from Deep Learned Depth and a Single Affine Correspondence

Author: D Barath
D Barath
D Barath
D Nistér
D Scaramuzza
F Fraundorfer
G Nakano
H Stewenius
I Eichhardt
J Bentolila
J Solomon
K Wilson
MA Fischler
PHS Torr
R Hartley
R Hartley
R Mur-Artal
R Mur-Artal
R Raguram
S Snavely
S Umeyama
T Hodaň
Z Kukelova
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We propose a new approach for combining deep-learned non-metric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. Considering the depth information and affine features, two new constraints on the camera pose are derived. The proposed solver is usable within 1-point RANSAC approaches. Thus, the processing time of the robust estimation is linear in the number of correspondences and, therefore, orders of magnitude faster than by using traditional approaches. The proposed 1AC+D solver is tested both on synthetic data and on 110395 publicly available real image pairs where we used an off-the-shelf monocular depth network to provide up-to-scale depth per pixel. The proposed 1AC+D leads to similar accuracy as traditional approaches while being significantly faster. When solving large-scale problems, e.g., pose-graph initialization for Structure-from-Motion (SfM) pipelines, the overhead of obtaining ACs and monocular depth is negligible compared to the speed-up gained in the pairwise geometric verification, i.e., relative pose estimation. This is demonstrated on scenes from the 1DSfM dataset using a state-of-the-art global SfM algorithm. Source code: https://github.com/eivan/one-ac-pos

arXiv.org e-Print Archive

Crossref

SZTAKI Publication Repository

Towards markerless computer-aided surgery combining deep segmentation and geometric pose estimation: application in total knee arthroplasty

Author: Hodaň T
Ronneberger O
Sundermeyer M
Zheng G
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

Author: BT Phong
P Vincent
S Hinterstoisser
S Hinterstoisser
S Hinterstoisser
SR Richter
T Hodaň
T-Y Lin
W Kehl
W Liu
Y Movshovitz-Attias
Z Zhang
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 10/09/2018
Field of study

We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model-based approaches and competes with state-of-the art approaches that require real pose-annotated images

Institute of Transport Research:Publications

Crossref

Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images

Author: B Calli
E Brachmann
J Johnson
J Vidal
M Everingham
M Krainin
M Rad
M Sundermeyer
M Waechter
P Henderson
S Hinterstoisser
T Hodaň
T-Y Lin
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2020
Field of study

© 2020, Springer Nature Switzerland AG. Recent methods for 6D pose estimation of objects assume either textured 3D models or real images that cover the entire range of target poses. However, it is difficult to obtain textured 3D models and annotate the poses of objects in real scenarios. This paper proposes a method, Neural Object Learning (NOL), that creates synthetic images of objects in arbitrary poses by combining only a few observations from cluttered images. A novel refinement step is proposed to align inaccurate poses of objects in source images, which results in better quality images. Evaluations performed on two public datasets show that the rendered images created by NOL lead to state-of-the-art performance in comparison to methods that use 13 times the number of real images. Evaluations on our new dataset show multiple objects can be trained and recognized simultaneously using a sequence of a fixed scene

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Christian Instrumentality of Sport as a Possible Source of Goodness for Atheists

Crossref