290 research outputs found

    Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

    Get PDF
    We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode

    Learning for Free – Object Detectors Trained on Synthetic Data

    Get PDF
    A picture is worth a thousand words, or if you want it labeled, it’s worth about four cents per bounding box. Data is the fuel that powers modern technologies run by artificial intelligence engines which is increasingly valuable in today’s industry. High quality labeled data is the most important factor in producing accurate machine learning models which can be used to make powerful predictions and identify patterns humans may not see. Acquiring high quality labeled data however, can be expensive and time consuming. For small companies, academic researchers, or machine learning hobbyists, gathering large datasets for a specific task that are not already publicly available is challenging. This research paper describes the techniques used to generate labeled image data synthetically which can be used in supervised learning for object detection. Technologies such as 3D modeling software in conjunction with Generative Adversarial Networks and image augmentation can create a realistic and diverse image dataset with bounding boxes and labels. The result of our effort is an accurate object detector in an environment of aerial surveillance with no cost to the end user. We achieved a best average precision score of 0.76 to classify and detect cars from an aerial perspective using a mix of GAN-refined data along with randomized synthetic data

    Bridging the Domain-Gap in Computer Vision Tasks

    Get PDF
    • …
    corecore