162 research outputs found

    Active recognition and pose estimation of rigid and deformable objects in 3D space

    Get PDF
    Object recognition and pose estimation is a fundamental problem in computer vision and of utmost importance in robotic applications. Object recognition refers to the problem of recognizing certain object instances, or categorizing objects into specific classes. Pose estimation deals with estimating the exact position of the object in 3D space, usually expressed in Euler angles. There are generally two types of objects that require special care when designing solutions to the aforementioned problems: rigid and deformable. Dealing with deformable objects has been a much harder problem, and usually solutions that apply to rigid objects, fail when used for deformable objects due to the inherent assumptions made during the design. In this thesis we deal with object categorization, instance recognition and pose estimation of both rigid and deformable objects. In particular, we are interested in a special type of deformable objects, clothes. We tackle the problem of autonomously recognizing and unfolding articles of clothing using a dual manipulator. This problem consists of grasping an article from a random point, recognizing it and then bringing it into an unfolded state by a dual arm robot. We propose a data-driven method for clothes recognition from depth images using Random Decision Forests. We also propose a method for unfolding an article of clothing after estimating and grasping two key-points, using Hough Forests. Both methods are implemented into a POMDP framework allowing the robot to interact optimally with the garments, taking into account uncertainty in the recognition and point estimation process. This active recognition and unfolding makes our system very robust to noisy observations. Our methods were tested on regular-sized clothes using a dual-arm manipulator. Our systems perform better in both accuracy and speed compared to state-of-the-art approaches. In order to take advantage of the robotic manipulator and increase the accuracy of our system, we developed a novel approach to address generic active vision problems, called Active Random Forests. While state of the art focuses on best viewing parameters selection based on single view classifiers, we propose a multi-view classifier where the decision mechanism of optimally changing viewing parameters is inherent to the classification process. This has many advantages: a) the classifier exploits the entire set of captured images and does not simply aggregate probabilistically per view hypotheses; b) actions are based on learnt disambiguating features from all views and are optimally selected using the powerful voting scheme of Random Forests and c) the classifier can take into account the costs of actions. The proposed framework was applied to the same task of autonomously unfolding clothes by a robot, addressing the problem of best viewpoint selection in classification, grasp point and pose estimation of garments. We show great performance improvement compared to state of the art methods and our previous POMDP formulation. Moving from deformable to rigid objects while keeping our interest to domestic robotic applications, we focus on object instance recognition and 3D pose estimation of household objects. We are particularly interested in realistic scenes that are very crowded and objects can be perceived under severe occlusions. Single shot-based 6D pose estimators with manually designed features are still unable to tackle such difficult scenarios for a variety of objects, motivating the research towards unsupervised feature learning and next-best-view estimation. We present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve 6D object pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets. Unsupervised feature learning, although efficient, might produce sub-optimal features for our particular tast. Therefore in our last work, we leverage the power of Convolutional Neural Networks to tackled the problem of estimating the pose of rigid objects by an end-to-end deep regression network. To improve the moderate performance of the standard regression objective function, we introduce the Siamese Regression Network. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset. Concluding, this work is a research on the area of object detection and pose estimation in 3D space, on a variety of object types. Furthermore we investigate how accuracy can be further improved by applying active vision techniques to optimally move the camera view to minimize the detection error.Open Acces

    Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image

    Full text link
    Occlusion-aware instance-sensitive segmentation is a complex task generally split into region-based segmentations, by approximating instances as their bounding box. We address the showcase scenario of dense homogeneous layouts in which this approximation does not hold. In this scenario, outlining unoccluded instances by decoding a deep encoder becomes difficult, due to the translation invariance of convolutional layers and the lack of complexity in the decoder. We therefore propose a multicameral design composed of subtask-specific lightweight decoder and encoder-decoder units, coupled in cascade to encourage subtask-specific feature reuse and enforce a learning path within the decoding process. Furthermore, the state-of-the-art datasets for occlusion-aware instance segmentation contain real images with few instances and occlusions mostly due to objects occluding the background, unlike dense object layouts. We thus also introduce a synthetic dataset of dense homogeneous object layouts, namely Mikado, which extensibly contains more instances and inter-instance occlusions per image than these public datasets. Our extensive experiments on Mikado and public datasets show that ordinal multiscale units within the decoding process prove more effective than state-of-the-art design patterns for capturing position-sensitive representations. We also show that Mikado is plausible with respect to real-world problems, in the sense that it enables the learning of performance-enhancing representations transferable to real images, while drastically reducing the need of hand-made annotations for finetuning. The proposed dataset will be made publicly available.Comment: International Journal of Computer Vision, Springer Verlag, 2020, Special Issue on Deep Learning for Robotic Visio

    6-DoF Grasp Learning in Partially Observable Cluttered Scenes

    Get PDF
    The key element of the efficient interaction of an intelligent robot with its im- mediate environment is object manipulation - a task that current data-driven methods reshape into various methods aimed at object localization, classification, segmentation, and grasp pose estimation. This work is concerned with the grasp pose estimation, namely with the implications of 6-DoF grasp pose estimation for partially visible cluttered scenes. In this thesis, two methods are proposed to address the problem of collision management of the grasp proposals and the full target scene due to the partial visibility and cluttered nature of a scene. The first explores the possibility of embedding input data with differential geometrical shape information, namely the modified mean curvature measure, to improve the qualitative results of grasp estimation. The second method proposes a supervisor network architecture termed Collision-GraspNet that classifies grasp proposals with respect to collision with the scene, including its occluded parts, and improves the invalid proposals via iterative pose sampling. The first proposed approach is tested on the Contact-GraspNet model and compared with GraspNet architecture baseline performance. In its turn, Collision-GraspNet is compared with an analytical proposal filtering approach employed by GraspNet, and evaluated in three stages using various datasets. Grasp supervisor architecture Collision-GraspNet outperformed the respective analytical approach and showed high confidence threshold flexibility. However, curvature-embedded data failed to improve upon the baseline model performance

    A real-time low-cost vision sensor for robotic bin picking

    Get PDF
    This thesis presents an integrated approach of a vision sensor for bin picking. The vision system that has been devised consists of three major components. The first addresses the implementation of a bifocal range sensor which estimates the depth by measuring the relative blurring between two images captured with different focal settings. A key element in the success of this approach is that it overcomes some of the limitations that were associated with other related implementations and the experimental results indicate that the precision offered by the sensor discussed in this thesis is precise enough for a large variety of industrial applications. The second component deals with the implementation of an edge-based segmentation technique which is applied in order to detect the boundaries of the objects that define the scene. An important issue related to this segmentation technique consists of minimising the errors in the edge detected output, an operation that is carried out by analysing the information associated with the singular edge points. The last component addresses the object recognition and pose estimation using the information resulting from the application of the segmentation algorithm. The recognition stage consists of matching the primitives derived from the scene regions, while the pose estimation is addressed using an appearance-based approach augmented with a range data analysis. The developed system is suitable for real-time operation and in order to demonstrate the validity of the proposed approach it has been examined under varying real-world scenes

    Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark

    Full text link
    This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments. Suction grasp policies typically employ data-driven approaches, necessitating large-scale, accurately-annotated suction grasp datasets. However, the generation of suction grasp datasets in cluttered environments remains underexplored, leaving uncertainties about the relationship between the object of interest and its surroundings. To address this, we propose a benchmark synthetic dataset, Sim-Suction-Dataset, comprising 500 cluttered environments with 3.2 million annotated suction grasp poses. The efficient Sim-Suction-Dataset generation process provides novel insights by combining analytical models with dynamic physical simulations to create fast and accurate suction grasp pose annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset, leveraging the synergy of zero-shot text-to-segmentation. Real-world experiments for picking up all objects demonstrate that Sim-Suction-Pointnet achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1 objects (prismatic shape), cluttered level 2 objects (more complex geometry), and cluttered mixed objects, respectively. The Sim-Suction policies outperform state-of-the-art benchmarks tested by approximately 21% in cluttered mixed scenes.Comment: IEEE Transactions on Robotic
    corecore