411 research outputs found
Depth-aware convolutional neural networks for accurate 3D pose estimation in RGB-D images
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most recent approaches to 3D pose estimation from RGB-D images address the problem in a two-stage pipeline. First, they learn a classifier –typically a random forest– to predict the position of each input pixel on the object surface. These estimates are then used to define an energy function that is minimized w.r.t. the object pose. In this paper, we focus on the first stage of the problem and propose a novel classifier based on a depth-aware Convolutional Neural Network. This classifier is able to learn a scale-adaptive regression model that yields very accurate pixel-level predictions, allowing to finally estimate the pose using a simple RANSAC-based scheme, with no need to optimize complex ad hoc energy functions. Our experiments on publicly available datasets show that our approach achieves remarkable improvements over state-of-the-art methods.Peer ReviewedPostprint (author's final draft
A Survey on Joint Object Detection and Pose Estimation using Monocular Vision
In this survey we present a complete landscape of joint object detection and
pose estimation methods that use monocular vision. Descriptions of traditional
approaches that involve descriptors or models and various estimation methods
have been provided. These descriptors or models include chordiograms,
shape-aware deformable parts model, bag of boundaries, distance transform
templates, natural 3D markers and facet features whereas the estimation methods
include iterative clustering estimation, probabilistic networks and iterative
genetic matching. Hybrid approaches that use handcrafted feature extraction
followed by estimation by deep learning methods have been outlined. We have
investigated and compared, wherever possible, pure deep learning based
approaches (single stage and multi stage) for this problem. Comprehensive
details of the various accuracy measures and metrics have been illustrated. For
the purpose of giving a clear overview, the characteristics of relevant
datasets are discussed. The trends that prevailed from the infancy of this
problem until now have also been highlighted.Comment: Accepted at the International Joint Conference on Computer Vision and
Pattern Recognition (CCVPR) 201
6D Pose Estimation of Textureless Objects from a Single Camera
V této práci se věnuji vyhledávání objektů v prostoru na základě jediného RGB snímku a to jak pozice na všech třech osách tak i rotace kolem každé z nich za pomocí 3D modelů daných objektů. Uplatnění těchto metod je zejména v robotickém uchopování, autonomním řízení, nebo augmentované realitě. Skvělým zdrojem pro hledání vhodné metody je BOP Challenge, ve kterém jsou porovnávány nejlepší nové algoritmy na množině datasetů. Vybraný algoritmus pak budu přizpůsobovat a naučím jej na svém vlastním datasetu. Současné nejlepší metody pro 6D detekci objektů používají kombinaci klasifikátorů - například Cosypose používá 3 různé neuronové sítě a EPOS používá k predikci 6 kroků včetně vlastní neuronové sítě. Oba algoritmy mají dostupnou implementaci a skvělé výsledky v BOP. Pro ukázku funkčnosti si vyberu 4 objekty a jejich 3d modely a pomocí kamery se pokusím vytvořit základní dataset. Dále ale pokračuji technikou renderování fotorealistických obrázků, která je kvůli automatickému anotování objektů ve všech dimenzích mnohem rychlejší a praktičtější na velká množství dat nutná pro trénování neuronové sítě.This thesis focuses on estimating the pose of objects based on only one RGB image of the scene. This includes the position of the object on the three-axis as well as its rotation using 3D models of the objects. Usage of such methods is mainly in robotic grasping, autonomous driving or augmented reality. A great source for discovering these methods is the BOP Challenge, which is a competition trying to find the best state of the art public method by comparing them on a list of datasets. I will then modify the chosen algorithm and train it on my own dataset. The current state of the art methods use a combination of classifiers. For example, Cosypose uses three neural networks, and EPOS utilizes six steps, including a neural network for the prediction. Both motioned algorithms have publicly available implementation and great results in the BOP Challenge. For my proof of concept, I choose to use 4 objects with their respective 3D models, and I try to create a training dataset using an RGB camera. Then I switch to photorealistic rendering of the training images, which is a lot faster and more practical for the amount of training data a neural network requires, mainly because it allows for automatic annotation of the objects in the 6D space
DeepRM: Deep Recurrent Matching for 6D Pose Refinement
Precise 6D pose estimation of rigid objects from RGB images is a critical but challenging task in robotics and augmented reality. To address this problem, we propose DeepRM, a novel recurrent network architecture for 6D pose refinement. DeepRM leverages initial coarse pose estimates to render synthetic images of target objects. The rendered images are then matched with the observed images to predict a rigid transform for updating the previous pose estimate. This process is repeated to incrementally refine the estimate at each iteration. LSTM units are used to propagate information through each refinement step, significantly improving overall performance. In contrast to many 2-stage Perspective-n-Point based solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can be tuned via a single parameter for accuracy and efficiency. During training, a multi-scale optical flow head is added to predict the optical flow between the observed and synthetic images. Optical flow prediction stabilizes the training process, and enforces the learning of features that are relevant to the task of pose estimation. Our results demonstrate that DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets
- …