9 research outputs found

    Multi-path Learning for Object Pose Estimation Across Domains

    Get PDF
    We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches.Comment: To appear at CVPR 2020; Code will be available here: https://github.com/DLR-RM/AugmentedAutoencoder/tree/multipat

    Multi-Path Learning for Object Pose Estimation Across Domains

    Get PDF
    We introduce a scalable approach for object pose estima-tion trained on simulated RGB views of multiple 3D modelstogether. We learn an encoding of object views that doesnot only describe an implicit orientation of all objects seenduring training, but can also relate views of untrained ob-jects. Our single-encoder-multi-decoder network is trainedusing a technique we denote multi-path learning: Whilethe encoder is shared by all objects, each decoder only re-constructs views of a single object. Consequently, viewsof different instances do not have to be separated in thelatent space and can share common features. The result-ing encoder generalizes well from synthetic to real dataand across various instances, categories, model types anddatasets. We systematically investigate the learned encod-ings, their generalization, and iterative refinement strate-gies on the ModelNet40 and T-LESS dataset. Despite train-ing jointly on multiple objects, our 6D Object Detectionpipeline achieves state-of-the-art results on T-LESS at muchlower runtimes than competing approaches

    CAD2Real: Deep learning with domain randomization of CAD data for 3D pose estimation of electronic control unit housings

    Get PDF
    Electronic control units (ECUs) are essential for many automobile components, e.g. engine, anti-lock braking system (ABS), steering and airbags. For some products, the 3D pose of each single ECU needs to be determined during series production. Deep learning approaches can not easily be applied to this problem, because labeled training data is not available in sufficient numbers. Thus, we train state-of-the-art artificial neural networks (ANNs) on purely synthetic training data, which is automatically created from a single CAD file. By randomizing parameters during rendering of training images, we enable inference on RGB images of a real sample part. In contrast to classic image processing approaches, this data-driven approach poses only few requirements regarding the measurement setup and transfers to related use cases with little development effort.Comment: Proc. 30. Workshop Computational Intelligence, Berlin, 202

    FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

    Get PDF
    In this paper, we focus on category-level 6D pose and size estimation from monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware autoencoder with 3D graph convolution for latent feature extraction. The learned latent feature is insensitive to point shift and object size thanks to the shift and scale-invariance properties of the 3D graph convolution. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. Meanwhile, we estimate translation and size by two residuals, which are the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora

    Few-shot Domain Adaptation for 3D Human Pose and Shape Estimation

    Get PDF
    Department of Computer Science and EngineeringDespite recent advancements in monocular 3D human pose and shape estimation, many previous works are susceptible to the domain gap between the training data and the test data. This problem become even more severe when the test samples are from challenging in-the-wild scenarios. This paper proposes a domain adaptation approach to mitigate the gap especially in few-shot test environment, utilizing (1) continuous metric loss to constrain the feature space distance relationships between different poses, and (2) segmentation module to localize foreground area so that negative effects from noisy background can be mitigated. Our method achieved slight improvement compared to the baseline on MPI-INF-3DHP and 3DPW datasets.ope

    Vision-guided Grasping of Arbitrary Objects through Experience-based Search Optimization

    Get PDF
    A desired capability for human-robot collaboration is the hand over of tools and object parts in a functional, effective way. Therefore, a robot has to grasp objects at specific spots, also called functional grasps. In this thesis an approach for such functional grasping of nearly unknown, arbitrary objects is presented. A differentiating factor to other approaches is the lack of a requirement for 3D models of the objects. Given a camera mounted on the robot's end-effector, the grasping position is defined by a single target image with human guidance. During execution, an iterative search for the target viewing angle is performed, based on an appearance similarity measure generated by an Auto-Encoder (AE) specifically trained to encode general object rotation. Further, a process for the fusion of data from previous grasping attempts is presented. This increases robustness and search efficiency by utilizing experience from previous executions. Additionally, a detection module is integrated, which enables the grasping of the target object in cluttered scenes. The developed method is evaluated in a simulation and on a real robotic platform. It can be shown, that the presented method is able to robustly find the pre-defined target orientation to grasp the objects
    corecore