55 research outputs found

    Object Pose Estimation in Monocular Image Using Modified FDCM

    Get PDF
    In this paper, a new method for object detection and pose estimation in a monocular image is proposed based on FDCM method. it can detect object with high speed running time, even if the object was under the partial occlusion or in bad illumination. In addition, It requires only single template without any training process. The Modied FDCM based on FDCM with improvments, the LSD method was used in MFDCM instead of the line tting method, besides the integral distance transform was replaced with a distance transform image, and using an angular Voronoi diagram. In addition, the search process depends on Line segments based search instead of the sliding window search in FDCM. The MFDCM was evaluated by comparing it with FDCM in dierent scenarios and with other four methods: COF, HALCON, LINE2D, and BOLD using D-textureless dataset. The comparison results show that MFDCM was at least 14 times faster than FDCM in tested scenarios. Furthermore, it has the highest correct detection rate among all tested method with small advantage from COF and BLOD methods, while it was a little slower than LINE2D which was the fasted method among compared methods. The results proves that MFDCM able to detect and pose estimation of the objects in the clear or clustered background from a monocular image with high speed running time, even if the object was under the partial occlusion which makes it robust and reliable for real-time applications

    Depth-aware convolutional neural networks for accurate 3D pose estimation in RGB-D images

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most recent approaches to 3D pose estimation from RGB-D images address the problem in a two-stage pipeline. First, they learn a classifier –typically a random forest– to predict the position of each input pixel on the object surface. These estimates are then used to define an energy function that is minimized w.r.t. the object pose. In this paper, we focus on the first stage of the problem and propose a novel classifier based on a depth-aware Convolutional Neural Network. This classifier is able to learn a scale-adaptive regression model that yields very accurate pixel-level predictions, allowing to finally estimate the pose using a simple RANSAC-based scheme, with no need to optimize complex ad hoc energy functions. Our experiments on publicly available datasets show that our approach achieves remarkable improvements over state-of-the-art methods.Peer ReviewedPostprint (author's final draft

    Efficient 3D object recognition via geometric information preservation

    Get PDF
    © 2019 Elsevier Ltd Accurate 3D object recognition and 6-DOF pose estimation have been pervasively applied to a variety of applications, such as unmanned warehouse, cooperative robots, and manufacturing industry. How to extract a robust and representative feature from the point clouds is an inevitable and important issue. In this paper, an unsupervised feature learning network is introduced to extract 3D keypoint features from point clouds directly, rather than transforming point clouds to voxel grids or projected RGB images, which saves computational time while preserving the object geometric information as well. Specifically, the proposed network features in a stacked point feature encoder, which can stack the local discriminative features within its neighborhoods to the original point-wise feature counterparts. The main framework consists of both offline training phase and online testing phase. In the offline training phase, the stacked point feature encoder is trained first and then generate feature database of all keypoints, which are sampled from synthetic point clouds of multiple model views. In the online testing phase, each feature extracted from the unknown testing scene is matched among the database by using the K-D tree voting strategy. Afterwards, the matching results are achieved by using the hypothesis & verification strategy. The proposed method is extensively evaluated on four public datasets and the results show that ours deliver comparable or even superior performances than the state-of-the-arts in terms of F1-score, Average of the 3D distance (ADD) and Recognition rate

    3D hand pose estimation using convolutional neural networks

    Get PDF
    3D hand pose estimation plays a fundamental role in natural human computer interactions. The problem is challenging due to complicated variations caused by complex articulations, multiple viewpoints, self-similar parts, severe self-occlusions, different shapes and sizes. To handle these challenges, the thesis makes the following contributions. First, the problem of the multiple viewpoints and complex articulations of hand pose estimation is tackled by decomposing and transforming the input and output space by spatial transformations following the hand structure. By the transformation, both the variation of the input space and output is reduced, which makes the learning easier. The second contribution is a probabilistic framework integrating all the hierarchical regressions. Variants with/without sampling, using different regressors and optimization methods are constructed and compared to provide an insight of the components under this framework. The third contribution is based on the observation that for images with occlusions, there exist multiple plausible configurations for the occluded parts. A hierarchical mixture density network is proposed to handle the multi-modality of the locations for occluded hand joints. It leverages the state-of-the-art hand pose estimators based on Convolutional Neural Networks to facilitate feature learning while models the multiple modes in a two-level hierarchy to reconcile single-valued (for visible joints) and multi-valued (for occluded joints) mapping in its output. In addition, a complete labeled real hand datasets is collected by a tracking system with six 6D magnetic sensors and inverse kinematics to automatically obtain 21-joints hand pose annotations of depth maps.Open Acces
    corecore