9 research outputs found
Multi-path Learning for Object Pose Estimation Across Domains
We introduce a scalable approach for object pose estimation trained on
simulated RGB views of multiple 3D models together. We learn an encoding of
object views that does not only describe an implicit orientation of all objects
seen during training, but can also relate views of untrained objects. Our
single-encoder-multi-decoder network is trained using a technique we denote
"multi-path learning": While the encoder is shared by all objects, each decoder
only reconstructs views of a single object. Consequently, views of different
instances do not have to be separated in the latent space and can share common
features. The resulting encoder generalizes well from synthetic to real data
and across various instances, categories, model types and datasets. We
systematically investigate the learned encodings, their generalization, and
iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite
training jointly on multiple objects, our 6D Object Detection pipeline achieves
state-of-the-art results on T-LESS at much lower runtimes than competing
approaches.Comment: To appear at CVPR 2020; Code will be available here:
https://github.com/DLR-RM/AugmentedAutoencoder/tree/multipat
Multi-Path Learning for Object Pose Estimation Across Domains
We introduce a scalable approach for object pose estima-tion trained on simulated RGB views of multiple 3D modelstogether. We learn an encoding of object views that doesnot only describe an implicit orientation of all objects seenduring training, but can also relate views of untrained ob-jects. Our single-encoder-multi-decoder network is trainedusing a technique we denote multi-path learning: Whilethe encoder is shared by all objects, each decoder only re-constructs views of a single object. Consequently, viewsof different instances do not have to be separated in thelatent space and can share common features. The result-ing encoder generalizes well from synthetic to real dataand across various instances, categories, model types anddatasets. We systematically investigate the learned encod-ings, their generalization, and iterative refinement strate-gies on the ModelNet40 and T-LESS dataset. Despite train-ing jointly on multiple objects, our 6D Object Detectionpipeline achieves state-of-the-art results on T-LESS at muchlower runtimes than competing approaches
CAD2Real: Deep learning with domain randomization of CAD data for 3D pose estimation of electronic control unit housings
Electronic control units (ECUs) are essential for many automobile components,
e.g. engine, anti-lock braking system (ABS), steering and airbags. For some
products, the 3D pose of each single ECU needs to be determined during series
production. Deep learning approaches can not easily be applied to this problem,
because labeled training data is not available in sufficient numbers. Thus, we
train state-of-the-art artificial neural networks (ANNs) on purely synthetic
training data, which is automatically created from a single CAD file. By
randomizing parameters during rendering of training images, we enable inference
on RGB images of a real sample part. In contrast to classic image processing
approaches, this data-driven approach poses only few requirements regarding the
measurement setup and transfers to related use cases with little development
effort.Comment: Proc. 30. Workshop Computational Intelligence, Berlin, 202
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
In this paper, we focus on category-level 6D pose and size estimation from
monocular RGB-D image. Previous methods suffer from inefficient category-level
pose feature extraction which leads to low accuracy and inference speed. To
tackle this problem, we propose a fast shape-based network (FS-Net) with
efficient category-level feature extraction for 6D pose estimation. First, we
design an orientation aware autoencoder with 3D graph convolution for latent
feature extraction. The learned latent feature is insensitive to point shift
and object size thanks to the shift and scale-invariance properties of the 3D
graph convolution. Then, to efficiently decode category-level rotation
information from the latent feature, we propose a novel decoupled rotation
mechanism that employs two decoders to complementarily access the rotation
information. Meanwhile, we estimate translation and size by two residuals,
which are the difference between the mean of object points and ground truth
translation, and the difference between the mean size of the category and
ground truth size, respectively. Finally, to increase the generalization
ability of FS-Net, we propose an online box-cage based 3D deformation mechanism
to augment the training data. Extensive experiments on two benchmark datasets
show that the proposed method achieves state-of-the-art performance in both
category- and instance-level 6D object pose estimation. Especially in
category-level pose estimation, without extra synthetic data, our method
outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora
Few-shot Domain Adaptation for 3D Human Pose and Shape Estimation
Department of Computer Science and EngineeringDespite recent advancements in monocular 3D human pose and shape estimation, many previous works are susceptible to the domain gap between the training data and the test data. This problem become even more severe when the test samples are from challenging in-the-wild scenarios. This paper proposes a domain adaptation approach to mitigate the gap especially in few-shot test environment, utilizing (1) continuous metric loss to constrain the feature space distance relationships between different poses, and (2) segmentation module to localize foreground area so that negative effects from noisy background can be mitigated. Our method achieved slight improvement compared to the baseline on MPI-INF-3DHP and 3DPW datasets.ope
Vision-guided Grasping of Arbitrary Objects through Experience-based Search Optimization
A desired capability for human-robot collaboration is the hand over of tools and object parts in a functional, effective way. Therefore, a robot has to grasp objects at specific spots, also called functional grasps. In this thesis an approach for such functional grasping of nearly unknown, arbitrary objects is presented. A differentiating factor to other approaches is the lack of a requirement for 3D models of the objects. Given a camera mounted on the robot's end-effector, the grasping position is defined by a single target image with human guidance.
During execution, an iterative search for the target viewing angle is performed, based on an appearance similarity measure generated by an Auto-Encoder (AE) specifically trained to encode general object rotation. Further, a process for the fusion of data from previous grasping attempts is presented. This increases robustness and search efficiency by utilizing experience from previous executions. Additionally, a detection module is integrated, which enables the grasping of the target object in cluttered scenes.
The developed method is evaluated in a simulation and on a real robotic platform. It can be shown, that the presented method is able to robustly find the pre-defined target orientation to grasp the objects