    On the Calibration of Active Binocular and RGBD Vision Systems for Dual-Arm Robots

    This paper describes a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot. For this purpose, we derive the forward kinematic model of our active robot head and describe our methodology for calibrating and integrating our robot head. This rigid calibration provides a closedform hand-to-eye solution. We then present an approach for updating dynamically camera external parameters for optimal 3D reconstruction that are the foundation for robotic tasks such as grasping and manipulating rigid and deformable objects. We show from experimental results that our robot head achieves an overall sub millimetre accuracy of less than 0.3 millimetres while recovering the 3D structure of a scene. In addition, we report a comparative study between current RGBD cameras and our active stereo head within two dual-arm robotic testbeds that demonstrates the accuracy and portability of our proposed methodology

    Autonomous vision-guided bi-manual grasping and manipulation

    This paper describes the implementation, demonstration and evaluation of a variety of autonomous, vision-guided manipulation capabilities, using a dual-arm Baxter robot. Initially, symmetric coordinated bi-manual manipulation based on kinematic tracking algorithm was implemented on the robot to enable a master-slave manipulation system. We demonstrate the efficacy of this approach with a human-robot collaboration experiment, where a human operator moves the master arm along arbitrary trajectories and the slave arm automatically follows the master arm while maintaining a constant relative pose between the two end-effectors. Next, this concept was extended to perform dual-arm manipulation without human intervention. To this extent, an image-based visual servoing scheme has been developed to control the motion of arms for positioning them at a desired grasp locations. Next we combine this with a dynamic position controller to move the grasped object using both arms in a prescribed trajectory. The presented approach has been validated by performing numerous symmetric and asymmetric bi-manual manipulations at different conditions. Our experiments demonstrated 80% success rate in performing the symmetric dual-arm manipulation tasks; and 73% success rate in performing asymmetric dualarm manipulation tasks

    GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger

    This work describes the development of a high-resolution tactile-sensing finger for robot grasping. This finger, inspired by previous GelSight sensing techniques, features an integration that is slimmer, more robust, and with more homogeneous output than previous vision-based tactile sensors. To achieve a compact integration, we redesign the optical path from illumination source to camera by combining light guides and an arrangement of mirror reflections. We parameterize the optical path with geometric design variables and describe the tradeoffs between the finger thickness, the depth of field of the camera, and the size of the tactile sensing area. The sensor sustains the wear from continuous use -- and abuse -- in grasping tasks by combining tougher materials for the compliant soft gel, a textured fabric skin, a structurally rigid body, and a calibration process that maintains homogeneous illumination and contrast of the tactile images during use. Finally, we evaluate the sensor's durability along four metrics that track the signal quality during more than 3000 grasping experiments.Comment: RA-L Pre-print. 8 page

    A Practical Approach for Picking Items in an Online Shopping Warehouse

    Commercially viable automated picking in unstructured environments by a robot arm remains a difficult challenge. The problem of robot grasp planning has long been around but the existing solutions tend to be limited when it comes to deploy them in open-ended realistic scenarios. Practical picking systems are called for that can handle the different properties of the objects to be manipulated, as well as the problems arising from occlusions and constrained accessibility. This paper presents a practical solution to the problem of robot picking in an online shopping warehouse by means of a novel approach that integrates a carefully selected method with a new strategy, the centroid normal approach (CNA), on a cost-effective dual-arm robotic system with two grippers specifically designed for this purpose: a two-finger gripper and a vacuum gripper. Objects identified in the scene point cloud are matched to the grasping techniques and grippers to maximize success. Extensive experimentation provides clues as to what are the reasons for success and failure. We chose as benchmark the scenario proposed by the 2017 Amazon Robotics Challenge, since it represents a realistic description of a retail shopping warehouse case; it includes many challenging constraints, such as a wide variety of different product items with a diversity of properties, which are also presented with restricted visibility and accessibility.This paper describes research conducted at the UJI Robotic Intelligence Laboratory. Support for this laboratory is provided in part by Ministerio de Economía y Competitividad (DPI2015-69041-R, DPI2017-89910-R), by Universitat Jaume I (P1-1B2014-52) and by Generalitat Valenciana (PROMETEO/2020/034). The first author was recipient of an Erasmus Mundus scholarship by the European Commission for the EMARO+ Master Program

    Scene understanding by robotic interactive perception

    This thesis presents a novel and generic visual architecture for scene understanding by robotic interactive perception. This proposed visual architecture is fully integrated into autonomous systems performing object perception and manipulation tasks. The proposed visual architecture uses interaction with the scene, in order to improve scene understanding substantially over non-interactive models. Specifically, this thesis presents two experimental validations of an autonomous system interacting with the scene: Firstly, an autonomous gaze control model is investigated, where the vision sensor directs its gaze to satisfy a scene exploration task. Secondly, autonomous interactive perception is investigated, where objects in the scene are repositioned by robotic manipulation. The proposed visual architecture for scene understanding involving perception and manipulation tasks has four components: 1) A reliable vision system, 2) Camera-hand eye calibration to integrate the vision system into an autonomous robot’s kinematic frame chain, 3) A visual model performing perception tasks and providing required knowledge for interaction with scene, and finally, 4) A manipulation model which, using knowledge received from the perception model, chooses an appropriate action (from a set of simple actions) to satisfy a manipulation task. This thesis presents contributions for each of the aforementioned components. Firstly, a portable active binocular robot vision architecture that integrates a number of visual behaviours are presented. This active vision architecture has the ability to verge, localise, recognise and simultaneously identify multiple target object instances. The portability and functional accuracy of the proposed vision architecture is demonstrated by carrying out both qualitative and comparative analyses using different robot hardware configurations, feature extraction techniques and scene perspectives. Secondly, a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot are described. For this purpose, the forward kinematic model of the active robot head is derived and the methodology for calibrating and integrating the robot head is described in detail. A rigid calibration methodology has been implemented to provide a closed-form hand-to-eye calibration chain and this has been extended with a mechanism to allow the camera external parameters to be updated dynamically for optimal 3D reconstruction to meet the requirements for robotic tasks such as grasping and manipulating rigid and deformable objects. It is shown from experimental results that the robot head achieves an overall accuracy of fewer than 0.3 millimetres while recovering the 3D structure of a scene. In addition, a comparative study between current RGB-D cameras and our active stereo head within two dual-arm robotic test-beds is reported that demonstrates the accuracy and portability of our proposed methodology. Thirdly, this thesis proposes a visual perception model for the task of category-wise objects sorting, based on Gaussian Process (GP) classification that is capable of recognising objects categories from point cloud data. In this approach, Fast Point Feature Histogram (FPFH) features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide a probability estimate of the identity of the object and serves the key role of modelling perception confidence in the interactive perception cycle. The interaction stage is responsible for invoking the appropriate action skills as required to confirm the identity of an observed object with high confidence as a result of executing multiple perception-action cycles. The recognition accuracy of the proposed perception model has been validated based on simulation input data using both Support Vector Machine (SVM) and GP based multi-class classifiers. Results obtained during this investigation demonstrate that by using a GP-based classifier, it is possible to obtain true positive classification rates of up to 80\%. Experimental validation of the above semi-autonomous object sorting system shows that the proposed GP based interactive sorting approach outperforms random sorting by up to 30\% when applied to scenes comprising configurations of household objects. Finally, a fully autonomous visual architecture is presented that has been developed to accommodate manipulation skills for an autonomous system to interact with the scene by object manipulation. This proposed visual architecture is mainly made of two stages: 1) A perception stage, that is a modified version of the aforementioned visual interaction model, 2) An interaction stage, that performs a set of ad-hoc actions relying on the information received from the perception stage. More specifically, the interaction stage simply reasons over the information (class label and associated probabilistic confidence score) received from perception stage to choose one of the following two actions: 1) An object class has been identified with high confidence, so remove from the scene and place it in the designated basket/bin for that particular class. 2) An object class has been identified with less probabilistic confidence, since from observation and inspired from the human behaviour of inspecting doubtful objects, an action is chosen to further investigate that object in order to confirm the object’s identity by capturing more images from different views in isolation. The perception stage then processes these views, hence multiple perception-action/interaction cycles take place. From an application perspective, the task of autonomous category based objects sorting is performed and the experimental design for the task is described in detail

    Learning to grasp in unstructured environments with deep convolutional neural networks using a Baxter Research Robot

    Recent advancements in Deep Learning have accelerated the capabilities of robotic systems in terms of visual perception, object manipulation, automated navigation, and human-robot collaboration. The capability of a robotic system to manipulate objects in unstructured environments is becoming an increasingly necessary skill. Due to the dynamic nature of these environments, traditional methods, that require expert human knowledge, fail to adapt automatically. After reviewing the relevant literature a method was proposed to utilise deep transfer learning techniques to detect object grasps from coloured depth images. A grasp describes how a robotic end-effector can be arranged to securely grasp an object and successfully lift it without slippage. In this study, a ResNet-50 convolutional neural network (CNN) model is trained on the Cornell grasp dataset. The training was completed within 30 hours using a workstation PC with accelerated GPU support via an NVIDIA Titan X. The trained grasp detection model was further evaluated with a Baxter research robot and a Microsoft Kinect-v2 and a successful grasp detection accuracy of 93.91% was achieved on a diverse set of novel objects. Physical grasping trials were conducted on a set of 8 different objects. The overall system achieves an average grasp success rate of 65.0% while performing the grasp detection in under 25 milliseconds. The results analysis concluded that the objects with reasonably straight edges and moderately pronounced heights above the table are easily detected and grasped by the system
