104 research outputs found
Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting
We propose a novel keypoint voting scheme based on intersecting spheres, that
is more accurate than existing schemes and allows for a smaller set of more
disperse keypoints. The scheme is based upon the distance between points, which
as a 1D quantity can be regressed more accurately than the 2D and 3D vector and
offset quantities regressed in previous work, yielding more accurate keypoint
localization. The scheme forms the basis of the proposed RCVPose method for 6
DoF pose estimation of 3D objects in RGB-D data, which is particularly
effective at handling occlusions. A CNN is trained to estimate the distance
between the 3D point corresponding to the depth mode of each RGB pixel, and a
set of 3 disperse keypoints defined in the object frame. At inference, a sphere
centered at each 3D point is generated, of radius equal to this estimated
distance. The surfaces of these spheres vote to increment a 3D accumulator
space, the peaks of which indicate keypoint locations. The proposed radial
voting scheme is more accurate than previous vector or offset schemes, and is
robust to disperse keypoints. Experiments demonstrate RCVPose to be highly
accurate and competitive, achieving state-of-the-art results on the LINEMOD
99.7% and YCB-Video 97.2% datasets, notably scoring +7.9% higher (71.1%) than
previous methods on the challenging Occlusion LINEMOD dataset
An Approach Of Features Extraction And Heatmaps Generation Based Upon Cnns And 3D Object Models
The rapid advancements in artificial intelligence have enabled recent progress of self-driving vehicles. However, the dependence on 3D object models and their annotations collected and owned by individual companies has become a major problem for the development of new algorithms. This thesis proposes an approach of directly using graphics models created from open-source datasets as the virtual representation of real-world objects. This approach uses Machine Learning techniques to extract 3D feature points and to create annotations from graphics models for the recognition of dynamic objects, such as cars, and for the verification of stationary and variable objects, such as buildings and trees. Moreover, it generates heat maps for the elimination of stationary/variable objects in real-time images before working on the recognition of dynamic objects. The proposed approach helps to bridge the gap between the virtual and physical worlds and to facilitate the development of new algorithms for self-driving vehicles
Localization in Unstructured Environments: Towards Autonomous Robots in Forests with Delaunay Triangulation
Autonomous harvesting and transportation is a long-term goal of the forest
industry. One of the main challenges is the accurate localization of both
vehicles and trees in a forest. Forests are unstructured environments where it
is difficult to find a group of significant landmarks for current fast
feature-based place recognition algorithms. This paper proposes a novel
approach where local observations are matched to a general tree map using the
Delaunay triangularization as the representation format. Instead of point cloud
based matching methods, we utilize a topology-based method. First, tree trunk
positions are registered at a prior run done by a forest harvester. Second, the
resulting map is Delaunay triangularized. Third, a local submap of the
autonomous robot is registered, triangularized and matched using triangular
similarity maximization to estimate the position of the robot. We test our
method on a dataset accumulated from a forestry site at Lieksa, Finland. A
total length of 2100\,m of harvester path was recorded by an industrial
harvester with a 3D laser scanner and a geolocation unit fixed to the frame.
Our experiments show a 12\,cm s.t.d. in the location accuracy and with
real-time data processing for speeds not exceeding 0.5\,m/s. The accuracy and
speed limit is realistic during forest operations
Fast Object Pose Estimation Using Adaptive Threshold for Bin-Picking
Robotic bin-picking is a common process in modern manufacturing, logistics, and warehousing that aims to pick-up known or unknown objects with random poses out of a bin by using a robot-camera system. Rapid and accurate object pose estimation pipelines have become an escalating issue for robot picking in recent years. In this paper, a fast 6-DoF (degrees of freedom) pose estimation pipeline for random bin-picking is proposed in which the pipeline is capable of recognizing different types of objects in various cluttered scenarios and uses an adaptive threshold segment strategy to accelerate estimation and matching for the robot picking task. Particularly, our proposed method can be effectively trained with fewer samples by introducing the geometric properties of objects such as contour, normal distribution, and curvature. An experimental setup is designed with a Kinova 6-Dof robot and an Ensenso industrial 3D camera for evaluating our proposed methods with respect to four different objects. The results indicate that our proposed method achieves a 91.25% average success rate and a 0.265s average estimation time, which sufficiently demonstrates that our approach provides competitive results for fast objects pose estimation and can be applied to robotic random bin-picking tasks
Imitrob: Imitation Learning Dataset for Training and Evaluating 6D Object Pose Estimators
This paper introduces a dataset for training and evaluating methods for 6D
pose estimation of hand-held tools in task demonstrations captured by a
standard RGB camera. Despite the significant progress of 6D pose estimation
methods, their performance is usually limited for heavily occluded objects,
which is a common case in imitation learning where the object is typically
partially occluded by the manipulating hand. Currently, there is a lack of
datasets that would enable the development of robust 6D pose estimation methods
for these conditions. To overcome this problem, we collect a new dataset
(Imitrob) aimed at 6D pose estimation in imitation learning and other
applications where a human holds a tool and performs a task. The dataset
contains image sequences of three different tools and six manipulation tasks
with two camera viewpoints, four human subjects, and left/right hand. Each
image is accompanied by an accurate ground truth measurement of the 6D object
pose, obtained by the HTC Vive motion tracking device. The use of the dataset
is demonstrated by training and evaluating a recent 6D object pose estimation
method (DOPE) in various setups. The dataset and code are publicly available at
http://imitrob.ciirc.cvut.cz/imitrobdataset.php
Visual articulated tracking in cluttered environments
This thesis is concerned with the state estimation of an articulated robotic manipulator during interaction with its environment. Traditionally, robot state estimation has relied on proprioceptive sensors as the single source of information about the internal state. In this thesis, we are motivated to shift the focus from proprioceptive to exteroceptive sensing, which is capable to represent a holistic interpretation of the entire manipulation scene. When visually observing grasping tasks, the tracked manipulator is subject to visual distractions caused by the background, the manipulated object and by occlusions from other objects present in the environment. The aim of this thesis is to investigate and develop methods for the robust visual state estimation of articulated kinematic chains in cluttered environments which suffer from partial occlusions. To make these methods widely applicable to a variety of kinematic setups and unseen environments, we intentionally refrain from using prior information about the internal state of the articulated kinematic chain, and we do not explicitly model visual distractions such as the background and manipulated objects in the environment. We approach this problem with model-fitting methods, in which an articulated model is associated to the observed data using discriminative information. We explore model-fitting objectives that are robust to occlusions and unseen environments, methods to generate synthetic training data for data-driven discriminative methods, and robust optimisers to minimise the tracking objective. This thesis contributes (1) an automatic colour and depth image synthesis pipeline for data-driven learning without depending on a real articulated robot; (2) a training strategy for discriminative model-fitting objectives with an implicit representation of objects; (3) a tracking objective that is able to track occluded parts of a kinematic chain; and finally (4) a robust multi-hypotheses optimiser. These contributions are evaluated on two robotic platforms in different environments and with different manipulated and occluding objects. We demonstrate that our image synthesis pipeline generalises well to colour and depth observations of the real robot without requiring real ground truth labelled images. While this synthesis approach introduces a visual simulation-to-reality gap, the combination of our robust tracking objective and optimiser enables stable tracking of an occluded end-effector during manipulation tasks
- …