162 research outputs found
Active recognition and pose estimation of rigid and deformable objects in 3D space
Object recognition and pose estimation is a fundamental problem in computer vision and of utmost importance in robotic applications. Object recognition refers to the problem of recognizing certain object instances, or categorizing objects into specific classes. Pose estimation deals with estimating the exact position of the object in 3D space, usually expressed in Euler angles. There are generally two types of objects that require special care when designing solutions to the aforementioned problems: rigid and deformable. Dealing with deformable objects has been a much harder problem, and usually solutions that apply to rigid objects, fail when used for deformable objects due to the inherent assumptions made during the design.
In this thesis we deal with object categorization, instance recognition and pose estimation of both rigid and deformable objects. In particular, we are interested in a special type of deformable objects, clothes. We tackle the problem of autonomously recognizing and unfolding articles of clothing using a dual manipulator. This problem consists of grasping an article from a random point, recognizing it and then bringing it into an unfolded state by a dual arm robot. We propose a data-driven method for clothes recognition from depth images using Random Decision Forests. We also propose a method for unfolding an article of clothing after estimating and grasping two key-points, using Hough Forests. Both methods are implemented into a POMDP framework allowing the robot to interact optimally with the garments, taking into account uncertainty in the recognition and point estimation process. This active recognition and unfolding makes our system very robust to noisy observations. Our methods were tested on regular-sized clothes using a dual-arm manipulator. Our systems perform better in both accuracy and speed compared to state-of-the-art approaches.
In order to take advantage of the robotic manipulator and increase the accuracy of our system, we developed a novel approach to address generic active vision problems, called Active Random Forests. While state of the art focuses on best viewing parameters selection based on single view classifiers, we propose a multi-view classifier where the decision mechanism of optimally changing viewing parameters is inherent to the classification process. This has many advantages: a) the classifier exploits the entire set of captured images and does not simply aggregate probabilistically per view hypotheses; b) actions are based on learnt disambiguating features from all views and are optimally selected using the powerful voting scheme of Random Forests and c) the classifier can take into account the costs of actions. The proposed framework was applied to the same task of autonomously unfolding clothes by a robot, addressing the problem of best viewpoint selection in classification, grasp point and pose estimation of garments. We show great performance improvement compared to state of the art methods and our previous POMDP formulation.
Moving from deformable to rigid objects while keeping our interest to domestic robotic applications, we focus on object instance recognition and 3D pose estimation of household objects. We are particularly interested in realistic scenes that are very crowded and objects can be perceived under severe occlusions. Single shot-based 6D pose estimators with manually designed features are still unable to tackle such difficult scenarios for a variety of objects, motivating the research towards unsupervised feature learning and next-best-view estimation. We present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve 6D object pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets.
Unsupervised feature learning, although efficient, might produce sub-optimal features for our particular tast. Therefore in our last work, we leverage the power of Convolutional Neural Networks to tackled the problem of estimating the pose of rigid objects by an end-to-end deep regression network. To improve the moderate performance of the standard regression objective function, we introduce the Siamese Regression Network. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset.
Concluding, this work is a research on the area of object detection and pose estimation in 3D space, on a variety of object types. Furthermore we investigate how accuracy can be further improved by applying active vision techniques to optimally move the camera view to minimize the detection error.Open Acces
Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image
Occlusion-aware instance-sensitive segmentation is a complex task generally
split into region-based segmentations, by approximating instances as their
bounding box. We address the showcase scenario of dense homogeneous layouts in
which this approximation does not hold. In this scenario, outlining unoccluded
instances by decoding a deep encoder becomes difficult, due to the translation
invariance of convolutional layers and the lack of complexity in the decoder.
We therefore propose a multicameral design composed of subtask-specific
lightweight decoder and encoder-decoder units, coupled in cascade to encourage
subtask-specific feature reuse and enforce a learning path within the decoding
process. Furthermore, the state-of-the-art datasets for occlusion-aware
instance segmentation contain real images with few instances and occlusions
mostly due to objects occluding the background, unlike dense object layouts. We
thus also introduce a synthetic dataset of dense homogeneous object layouts,
namely Mikado, which extensibly contains more instances and inter-instance
occlusions per image than these public datasets. Our extensive experiments on
Mikado and public datasets show that ordinal multiscale units within the
decoding process prove more effective than state-of-the-art design patterns for
capturing position-sensitive representations. We also show that Mikado is
plausible with respect to real-world problems, in the sense that it enables the
learning of performance-enhancing representations transferable to real images,
while drastically reducing the need of hand-made annotations for finetuning.
The proposed dataset will be made publicly available.Comment: International Journal of Computer Vision, Springer Verlag, 2020,
Special Issue on Deep Learning for Robotic Visio
6-DoF Grasp Learning in Partially Observable Cluttered Scenes
The key element of the efficient interaction of an intelligent robot with its im-
mediate environment is object manipulation - a task that current data-driven
methods reshape into various methods aimed at object localization, classification, segmentation, and grasp pose estimation. This work is concerned with the grasp pose estimation, namely with the implications of 6-DoF grasp pose estimation for partially visible cluttered scenes.
In this thesis, two methods are proposed to address the problem of collision
management of the grasp proposals and the full target scene due to the partial visibility and cluttered nature of a scene. The first explores the possibility of embedding input data with differential geometrical shape information, namely the modified mean curvature measure, to improve the qualitative results of grasp estimation. The second method proposes a supervisor network architecture termed Collision-GraspNet that classifies grasp proposals with respect to collision with the scene, including its occluded parts, and improves the invalid proposals via iterative pose sampling.
The first proposed approach is tested on the Contact-GraspNet model and
compared with GraspNet architecture baseline performance. In its turn, Collision-GraspNet is compared with an analytical proposal filtering approach employed by GraspNet, and evaluated in three stages using various datasets.
Grasp supervisor architecture Collision-GraspNet outperformed the respective
analytical approach and showed high confidence threshold flexibility. However, curvature-embedded data failed to improve upon the baseline model performance
A real-time low-cost vision sensor for robotic bin picking
This thesis presents an integrated approach of a vision sensor for bin picking. The vision system that has been devised consists of three major components. The first addresses the implementation of a bifocal range sensor which estimates the depth by measuring the relative blurring between two images captured with different focal settings. A key element in the success of this approach is that it overcomes some of the limitations that were associated with other related implementations and the experimental results indicate that the precision offered by the sensor discussed in this thesis is precise enough for a large variety of industrial applications. The second component deals with the implementation of an edge-based segmentation technique which is applied in order to detect the boundaries of the objects that define the scene. An important issue related to this segmentation technique consists of minimising the errors in the edge detected output, an operation that is carried out by analysing the information associated with the singular edge points. The last component addresses the object recognition and pose estimation using the information resulting from the application of the segmentation algorithm. The recognition stage consists of matching the primitives derived from the scene regions, while the pose estimation is addressed using an appearance-based approach augmented with a range data analysis. The developed system is suitable for real-time operation and in order to demonstrate the validity of the proposed approach it has been examined under varying real-world scenes
Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark
This paper presents Sim-Suction, a robust object-aware suction grasp policy
for mobile manipulation platforms with dynamic camera viewpoints, designed to
pick up unknown objects from cluttered environments. Suction grasp policies
typically employ data-driven approaches, necessitating large-scale,
accurately-annotated suction grasp datasets. However, the generation of suction
grasp datasets in cluttered environments remains underexplored, leaving
uncertainties about the relationship between the object of interest and its
surroundings. To address this, we propose a benchmark synthetic dataset,
Sim-Suction-Dataset, comprising 500 cluttered environments with 3.2 million
annotated suction grasp poses. The efficient Sim-Suction-Dataset generation
process provides novel insights by combining analytical models with dynamic
physical simulations to create fast and accurate suction grasp pose
annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction
grasp poses by learning point-wise affordances from the Sim-Suction-Dataset,
leveraging the synergy of zero-shot text-to-segmentation. Real-world
experiments for picking up all objects demonstrate that Sim-Suction-Pointnet
achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1
objects (prismatic shape), cluttered level 2 objects (more complex geometry),
and cluttered mixed objects, respectively. The Sim-Suction policies outperform
state-of-the-art benchmarks tested by approximately 21% in cluttered mixed
scenes.Comment: IEEE Transactions on Robotic
Local Features to a Global View: Recognition of Occluded Objects by Spectral Matching Using Pairwise Feature Relationships
Ph.DDOCTOR OF PHILOSOPH
- …