30 research outputs found

    Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

    Get PDF
    We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode

    Recovering 6D Object Pose: A Review and Multi-modal Analysis

    Full text link
    A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem

    Single-Image Depth Prediction Makes Feature Matching Easier

    Get PDF
    Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.Comment: 14 pages, 7 figures, accepted for publication at the European conference on computer vision (ECCV) 202

    Evidence-based nanoscopic and molecular framework for excipient functionality in compressed orally disintegrating tablets

    Get PDF
    The work investigates the adhesive/cohesive molecular and physical interactions together with nanoscopic features of commonly used orally disintegrating tablet (ODT) excipients microcrystalline cellulose (MCC) and D-mannitol. This helps to elucidate the underlying physico-chemical and mechanical mechanisms responsible for powder densification and optimum product functionality. Atomic force microscopy (AFM) contact mode analysis was performed to measure nano-adhesion forces and surface energies between excipient-drug particles (6-10 different particles per each pair). Moreover, surface topography images (100 nm2-10 Όm2) and roughness data were acquired from AFM tapping mode. AFM data were related to ODT macro/microscopic properties obtained from SEM, FTIR, XRD, thermal analysis using DSC and TGA, disintegration testing, Heckel and tabletability profiles. The study results showed a good association between the adhesive molecular and physical forces of paired particles and the resultant densification mechanisms responsible for mechanical strength of tablets. MCC micro roughness was 3 times that of D-mannitol which explains the high hardness of MCC ODTs due to mechanical interlocking. Hydrogen bonding between MCC particles could not be established from both AFM and FTIR solid state investigation. On the contrary, D-mannitol produced fragile ODTs due to fragmentation of surface crystallites during compression attained from its weak crystal structure. Furthermore, AFM analysis has shown the presence of extensive micro fibril structures inhabiting nano pores which further supports the use of MCC as a disintegrant. Overall, excipients (and model drugs) showed mechanistic behaviour on the nano/micro scale that could be related to the functionality of materials on the macro scale. © 2014 Al-khattawi et al

    Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

    Get PDF
    We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model-based approaches and competes with state-of-the art approaches that require real pose-annotated images
    corecore