305 research outputs found

    iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

    Full text link
    We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded. Recent RGB-D-based methods are robust to moderate degrees of occlusion. For RGB inputs, no previous method works well for partly occluded objects. Our main contribution is to present the first deep learning-based system that estimates accurate poses for partly occluded objects from RGB-D and RGB input. We achieve this with a new instance-aware pipeline that decomposes 6D object pose estimation into a sequence of simpler steps, where each step removes specific aspects of the problem. The first step localizes all known objects in the image using an instance segmentation network, and hence eliminates surrounding clutter and occluders. The second step densely maps pixels to 3D object surface positions, so called object coordinates, using an encoder-decoder network, and hence eliminates object appearance. The third, and final, step predicts the 6D pose using geometric optimization. We demonstrate that we significantly outperform the state-of-the-art for pose estimation of partly occluded objects for both RGB and RGB-D input

    Learning to Navigate the Energy Landscape

    Full text link
    In this paper, we present a novel and efficient architecture for addressing computer vision problems that use `Analysis by Synthesis'. Analysis by synthesis involves the minimization of the reconstruction error which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Convolutional Neural Networks are used to initialize local search algorithms. While these methods have been shown to produce promising results, they often get stuck in local optima. Our method goes beyond the conventional hybrid architecture by not only proposing multiple accurate initial solutions but by also defining a navigational structure over the solution space that can be used for extremely efficient gradient-free local search. We demonstrate the efficacy of our approach on the challenging problem of RGB Camera Relocalization. To make the RGB camera relocalization problem particularly challenging, we introduce a new dataset of 3D environments which are significantly larger than those found in other publicly-available datasets. Our experiments reveal that the proposed method is able to achieve state-of-the-art camera relocalization results. We also demonstrate the generalizability of our approach on Hand Pose Estimation and Image Retrieval tasks

    Real-Time 6D Object Pose Estimation on CPU

    Full text link
    We propose a fast and accurate 6D object pose estimation from a RGB-D image. Our proposed method is template matching based and consists of three main technical components, PCOF-MOD (multimodal PCOF), balanced pose tree (BPT) and optimum memory rearrangement for a coarse-to-fine search. Our model templates on densely sampled viewpoints and PCOF-MOD which explicitly handles a certain range of 3D object pose improve the robustness against background clutters. BPT which is an efficient tree-based data structures for a large number of templates and template matching on rearranged feature maps where nearby features are linearly aligned accelerate the pose estimation. The experimental evaluation on tabletop and bin-picking dataset showed that our method achieved higher accuracy and faster speed in comparison with state-of-the-art techniques including recent CNN based approaches. Moreover, our model templates can be trained only from 3D CAD in a few minutes and the pose estimation run in near real-time (23 fps) on CPU. These features are suitable for any real applications.Comment: accepted to IROS 201

    6D object position estimation from 2D images: a literature review

    Get PDF
    The 6D pose estimation of an object from an image is a central problem in many domains of Computer Vision (CV) and researchers have struggled with this issue for several years. Traditional pose estimation methods (1) leveraged on geometrical approaches, exploiting manually annotated local features, or (2) relied on 2D object representations from different points of view and their comparisons with the original image. The two methods mentioned above are also known as Feature-based and Template-based, respectively. With the diffusion of Deep Learning (DL), new Learning-based strategies have been introduced to achieve the 6D pose estimation, improving traditional methods by involving Convolutional Neural Networks (CNN). This review analyzed techniques belonging to different research fields and classified them into three main categories: Template-based methods, Feature-based methods, and Learning-Based methods. In recent years, the research mainly focused on Learning-based methods, which allow the training of a neural network tailored for a specific task. For this reason, most of the analyzed methods belong to this category, and they have been in turn classified into three sub-categories: Bounding box prediction and Perspective-n-Point (PnP) algorithm-based methods, Classification-based methods, and Regression-based methods. This review aims to provide a general overview of the latest 6D pose recovery methods to underline the pros and cons and highlight the best-performing techniques for each group. The main goal is to supply the readers with helpful guidelines for the implementation of performing applications even under challenging circumstances such as auto-occlusions, symmetries, occlusions between multiple objects, and bad lighting conditions

    Object Pose Estimation in Monocular Image Using Modified FDCM

    Get PDF
    In this paper, a new method for object detection and pose estimation in a monocular image is proposed based on FDCM method. it can detect object with high speed running time, even if the object was under the partial occlusion or in bad illumination. In addition, It requires only single template without any training process. The Modied FDCM based on FDCM with improvments, the LSD method was used in MFDCM instead of the line tting method, besides the integral distance transform was replaced with a distance transform image, and using an angular Voronoi diagram. In addition, the search process depends on Line segments based search instead of the sliding window search in FDCM. The MFDCM was evaluated by comparing it with FDCM in dierent scenarios and with other four methods: COF, HALCON, LINE2D, and BOLD using D-textureless dataset. The comparison results show that MFDCM was at least 14 times faster than FDCM in tested scenarios. Furthermore, it has the highest correct detection rate among all tested method with small advantage from COF and BLOD methods, while it was a little slower than LINE2D which was the fasted method among compared methods. The results proves that MFDCM able to detect and pose estimation of the objects in the clear or clustered background from a monocular image with high speed running time, even if the object was under the partial occlusion which makes it robust and reliable for real-time applications

    A deep learning framework for real-time 3D model registration in robot-assisted laparoscopic surgery

    Get PDF
    Introduction The current study presents a deep learning framework to determine, in real-time, position and rotation of a target organ from an endoscopic video. These inferred data are used to overlay the 3D model of patient's organ over its real counterpart. The resulting augmented video flow is streamed back to the surgeon as a support during laparoscopic robot-assisted procedures. Methods This framework exploits semantic segmentation and, thereafter, two techniques, based on Convolutional Neural Networks and motion analysis, were used to infer the rotation. Results The segmentation shows optimal accuracies, with a mean IoU score greater than 80% in all tests. Different performance levels are obtained for rotation, depending on the surgical procedure. Discussion Even if the presented methodology has various degrees of precision depending on the testing scenario, this work sets the first step for the adoption of deep learning and augmented reality to generalise the automatic registration process
    corecore