39 research outputs found
Evaluating Computer Vision Methods for Detection and Pose Estimation of Textureless Objects
Master's thesis in Automation and signal processingRobotics, AI and automation; search for these words and two things become apparent. An era of automation is upon us, but even so there are still some simple tasks that grinds it to a halt, e.g. picking and placing objects. These simple tasks require coordination from a robot, and object detection from a computer vision system. That’s not to say that robots are incapable of picking up objects, as the simple and organised cases have been solved some time ago. The problems occur in cases where there are no order, in other words chaos. In these cases it is beneficial to detect and find the pose of the object, so that it can be picked up and packed while having full control over the position the object was placed in. This thesis is written at the behest of Pickr.ai, a company looking to automate the picking and packing for retail businesses.
The objective of this thesis is to evaluate available pose estimating methods, and if possible single out one that is best suited for the retail environment. Current state of the art methods that are capable of estimating the pose of objects utilise convolutional neural networks for both detection and estimation. The leading methods can achieve accuracy upwards of the high 90% on pretrained objects. The case with retail is that the volume of available wares may be so large that training on each item is prohibitive. Therefore the testing done has mostly been aimed at the method’s generalisability, whether they can detect objects without prior training specific for the object.
A few different methods with varying solutions were examined, from the simpler pure object detectors to two stage 6D pose estimators. Unfortunately none of the methods can be deemed appropriate for the task as it currently stands. The methods do not recognise new objects, and the improvement from limited training does not improve the scores significantly. However, by applying the approaches that are incorporated in the other methods, it may be possible to develop an appropriate new pose estimator capable of handling a retail environment
PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking
6D object pose estimation holds essential roles in various fields,
particularly in the grasping of industrial workpieces. Given challenges like
rust, high reflectivity, and absent textures, this paper introduces a point
cloud based pose estimation framework (PS6D). PS6D centers on slender and
multi-symmetric objects. It extracts multi-scale features through an
attention-guided feature extraction module, designs a symmetry-aware rotation
loss and a center distance sensitive translation loss to regress the pose of
each point to the centroid of the instance, and then uses a two-stage
clustering method to complete instance segmentation and pose estimation.
Objects from the Sil\'eane and IPA datasets and typical workpieces from
industrial practice are used to generate data and evaluate the algorithm. In
comparison to the state-of-the-art approach, PS6D demonstrates an 11.5\%
improvement in F and a 14.8\% improvement in Recall. The main part
of PS6D has been deployed to the software of Mech-Mind, and achieves a 91.7\%
success rate in bin-picking experiments, marking its application in industrial
pose estimation tasks
L'estimation de pose 6DOF pour les objets industriels complexes dans le cadre de bin picking
Dans le domaine d’assemblage industriel, ce travail vise à apporter de la vision artificielle avancée à l’assemblage de la pièce à l’étude qui est considérée de nature complexe. Dans le cadre d’un problème de Bin Picking, où plusieurs pièces sont déposées de façon aléatoire dans un contenant, ce travail consiste à trouver une solution au bras robotisé pour détecter et estimer la 6DoF. En l’occurrence, une tentative d'essai avec PoseCNN a déjà été testée auparavant, mais cet algorithme n'a réussi à détecter aucune instance dans une scène réelle. Pour mener à bien ce projet, le présent travail vise à étudier plus en détails les raisons de ce problème et à proposer une solution de détection et d’estimation de pose 6DoF qui convient le mieux à la pièce à l’étude étant un objet industriel complexe. Évidemment, l’estimation de pose 6DoF dépend grandement de la détection d’objets. Les approches d’estimation de la 6DoF de l’état de l’art sont fortement basées sur les images RGB qui impliquent une détection 2D, ce qui néglige certains détails géométriques de l'objet. D’autre côté, certaines d'entre elles sont complétement dépendantes des nuages de points, ce qui impose une détection négligeant la couleur et les caractéristiques 2D de l’image RGB. Donc, l'idée est d'avoir une combinaison qui tire parti de la détection 2D et 3D pour extraire le plus de caractéristiques déterminantes possibles. Dans ce contexte, PVN3D fusionne les caractéristiques 2D et 3D et ajoute un réseau de neurones profond de vote 3D pour déterminer les keypoints 3D à la manière de VoteNet lancé par Facebook. De ce fait, la contribution de ce travail est d’intégrer la base de données T-LESS d’objets industriels à PVN3D. Comme résultat final, en exécutant PVN3D, l’objet T-LESS est bien reconnu et ses 8 keypoints ainsi que son centre sont à l’emplacement exacte. Pour conclure, cette perception avancée du bras robotisé permettra aux entreprises d’accélérer leurs assemblages de pièces et de multiplier ainsi leur productivité
Robotic manipulation in clutter with object-level semantic mapping
To intelligently interact with environments and achieve useful tasks, robots need some level of understanding of a scene to plan sensible actions accordingly.
Semantic world models have been widely used in robotic manipulation, giving geometry and semantic information of objects that are vital to generating motions to complete tasks.
Using these models, typical traditional robotic systems generate motions with analysis-based motion planning, which often applies collision checks to generate a safe trajectory to execute.
It is primarily crucial for robots to build such world models autonomously, ideally with flexible and low-cost sensors such as on-board cameras, and generate motions with succeeding planning pipelines.
With recent progress on deep neural networks, increasing research has worked on end-to-end approaches to manipulation.
A typical end-to-end approach does not explicitly build world models, and instead generates motions from direct mapping from raw observation such as images, to introduce flexibility to handle novel objects and capability of manipulation beyond analysis-based motion planning.
However, this approach struggles to deal with long-horizon tasks that include several steps of grasping and placement, for which many action steps have to be inferred by learned models to generate trajectory.
This difficulty motivated us to use a hybrid approach of learned and traditional to take advantage of both, as previous studies on robotic manipulation showed long-horizon task achievements with explicit world models.
This thesis develops a robotic system that manipulates objects to change their states as requested with high-success, efficient, and safe maneuvers. In particular, we build an object-level semantic mapping pipeline that is able to build world models dealing with various objects in clutter, which is then integrated with various learned components to acquire manipulation skills. Our tight integration of explicit semantic mapping and learned motion generation enables the robot to accomplish long-horizon tasks with the extra capability of manipulation introduced by learning.Open Acces
A Multi-view Pixel-wise Voting Network for 6DoF Pose Estimation
6DoF pose estimation is an important task in the Computer Vision field
for what regards robotic and automotive applications. Many recent approaches successfully perform pose estimation on monocular images, which
lack depth information. In this work, the potential of extending such
methods to a multi-view setting is explored, in order to recover depth information from geometrical relations between the views. In particular two
different multi-view adaptations for a particular monocular pose estimator, called PVNet, are developed, by either combining monocular results
on the individual views or by modifying the original method to take in
input directly the set of views. The new models are evaluated on the TOD
transparent object dataset and compared against the original PVNet implementation, a depth-based pose estimation called DenseFusion, and the
method proposed by the authors of the dataset, called Keypose. Experimental results show that integrating multi-view information significantly
increases test accuracy and that both models outperform DenseFusion,
while still being slightly surpassed by Keypose
Reliable Object Pose Estimation
Denne afhandling addresserer pålidelig estimering af objekters position og orientering, kaldt et objekts pose, fra ét eller flere billeder. Det er ikke altid muligt at identificere posen unikt fra et billede, for eksempel hvis objektet er symmetrisk, eller hvis der er noget, der dækker for objektet i billedet. Istedet for at estimere én mulig pose, fokuserer vi på at estimere usikkerheden som en distribution for at muliggøre pålidelighed i robotapplikationer, der gør brug af pose-estimering.Først kigger vi på pose-estimering af objekter på en lineær vibrations-feeder, hvor den relevante pose-usikkerhed kan beskrives som en distribution over et diskret sæt af rotationer. Kvantificering af usikkerhed er interessant i denne applikation, fordi et forkert pose-estimat kan lede til produktionsstop, samt beskadigede varer og udstyr, og vi viser, at vores usikkerhedsestimater faciliterer et pålideligt system. En gængs tilgang til pose-estimering er at etablere korrespondancer mellem punkter i billedet og punkter på objektet. Normalt antages det, at korrespondanceusikkerheden er meget begrænset, men den antagelse falder fra hinanden ved ambiguiteter som dem der kommer ved symmetriske objekter. Vi præsenterer SurfEmb, som modelerer uparametriserede korrespondancedistributioner, og vi viser, hvordan de kan bruges til at forbedre pose-estimering. Metoden var øverst på pose-estimerings-benchmarken, BOP, i næsten et år.Pose-estimering fra et enkelt billede har problemer med dybdeambiguitet og er følsom overfor hvis noget dækker for objektet i billedet. Vi præsenterer EpiSurfEmb, som optimerer for en pose, der maksimerer korrespondancesandsynligheder på tværs af billeder fra flere vinkler. For at få bedre pose-hypoteser kombinerer vi også 2D-3D korrespondancer fra SurfEmb med epipolar geometri for at opnå 3D-3D korrespondancer. Vores resultater viser, at vi kan reducere fejl med 80-91 %, når der er billeder fra flere vinkler tilgængelige.Vi præsenterer Ki-Pode, som vender korrespondanceproblemet til at estimere distributioner af projekteringen af prædefinerede punkter på objektet, og vi viser, hvordan de distributioner kan bruges til at estimere en pose-distribution. På grund af mangel på en metode til at normalisere over pose-rummet, viser vi kun distributions-resultater på rotationer, hvor Ki-Pode giver mere pålidelige estimater end andre metoder på YCBV.Til sidst præsenterer vi SpyroPose, som skalerer uparametriserede distributioner til poserummet. Hovedidéen er at lære distributioner i flere opløsninger, som tillader mere effektiv træning og flere størrelsesordener færre evalueringer ved inferens. Metoden kan både bruges i rotations- og pose-rummet. Vi præsenterer state-of-the-art resultater på estimering af rotationsdistributioner, og på estimering af pose-distributioner viser vi de første kvalitative resultater på virkelige billeder og de første kvantitative resultater overhovedet. Vi viser også at metoden nemt kan udvides til at bruges flere billeder og på den måde begrænse usikkerheder forbundet med at estimere en pose fra et enkelt billede.This thesis addresses estimation of an object’s position and orientation, called the object’s pose, from one or more images. Current methods aim to recover the “best” pose, however, in case of symmetries, occlusions, etc., many poses may explain the observed image, and a single pose estimate cannot explain this ambiguity. Instead of providing a point estimate, we aim to estimate the uncertainty to enable reliability for downstream robot systems.First, we consider pose estimation for objects on a linear vibratory feeder. Erroneous pose estimates could lead to damaged products or equipment, so reliability is key. We show that the relevant pose uncertainties can be approximated by a distribution over a small discrete set of rotations, and that the uncertainty quantification allows us to reliably avoid failure.Second, a common approach to pose estimation is establishing correspondences between points in the image and points on the object. Usually, it is assumed that an image point could only correspond to one point on the object, however, that assumption breaks in case of ambiguities such as those imposed by symmetries. We present SurfEmb, modeling continuous, unparameterized correspondence distributions, and we show how to use the distributions to obtain better pose estimates. Our method was on top of the main pose estimation benchmark, BOP, for almost a year.Third, single-view pose estimation inherently suffers from depth ambiguity and sensitivity to occlusions. We present EpiSurfEmb which optimizes for the pose which maximizes correspondence likelihoods across views. For better pose hypothesis generation, we also combine the image-object correspondence distributions from SurfEmb with epipolar geometry to estimate scene-object correspondence distributions. Our results show, that we can reduce errors by 80-91 %, when multiple images are available.Fourth, we present Ki-Pode, which flips the correspondence distribution problem to estimating the projection of predefined object keypoints, and we show how the projection-distributions can provide an estimate of the pose distribution. Due to the lack of a way to normalize over the large pose space, we only show distribution estimation on the rotation space, where Ki-Pode provides more reliable estimates across objects than other methods on YCBV.Fifth, we present SpyroPose which addresses how to scale unparameterized distribution models to pose space. The main idea is to learn distributions at multiple resolutions, allowing more efficient training and many orders of magnitude fewer evaluations at test time. The method can be applied to both rotation and pose space. We present state-of-the-art results on rotation distributions estimation, and on pose distribution estimation, we present the first qualitative results on real images and the first quantitative results at all. We also show that the method extends readily to a multi-view version, presenting a principled way to fuse pose information from multiple images
