    Hierarchical Salient Object Detection for Assisted Grasping

    Visual scene decomposition into semantic entities is one of the major challenges when creating a reliable object grasping system. Recently, we introduced a bottom-up hierarchical clustering approach which is able to segment objects and parts in a scene. In this paper, we introduce a transform from such a segmentation into a corresponding, hierarchical saliency function. In comprehensive experiments we demonstrate its ability to detect salient objects in a scene. Furthermore, this hierarchical saliency defines a most salient corresponding region (scale) for every point in an image. Based on this, an easy-to-use pick and place manipulation system was developed and tested exemplarily.Comment: Accepted for ICRA 201

    Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking

    A new generation of automated bin picking systems using deep learning is evolving to support increasing demand for e-commerce. To accommodate a wide variety of products, many automated systems include multiple gripper types and/or tool changers. However, for some objects, sequential grasp failures are common: when a computed grasp fails to lift and remove the object, the bin is often left unchanged; as the sensor input is consistent, the system retries the same grasp over and over, resulting in a significant reduction in mean successful picks per hour (MPPH). Based on an empirical study of sequential failures, we characterize a class of "sequential failure objects" (SFOs) -- objects prone to sequential failures based on a novel taxonomy. We then propose three non-Markov picking policies that incorporate memory of past failures to modify subsequent actions. Simulation experiments on SFO models and the EGAD dataset suggest that the non-Markov policies significantly outperform the Markov policy in terms of the sequential failure rate and MPPH. In physical experiments on 50 heaps of 12 SFOs the most effective Non-Markov policy increased MPPH over the Dex-Net Markov policy by 107%.Comment: 2020 IEEE International Conference on Automation Science and Engineering (CASE

    Improved Deep Neural Networks for Generative Robotic Grasping

    This thesis provides a thorough evaluation of current state-of-the-art robotic grasping methods and contributes to a subset of data-driven grasp estimation approaches, termed generative models. These models aim to directly generate grasp region proposals from a given image without the need for a separate analysis and ranking step, which can be computationally expensive. This approach allows for fully end-to-end training of a model and quick closed-loop operation of a robot arm. A number of limitations are identified within these generative models, which are identified and addressed. Contributions are proposed that directly target each stage of the training pipeline that help to form accurate grasp proposals and generalise better to unseen objects. Firstly, inspired by theories of object manipulation within the mammalian visual system, the use of multi-task learning in existing generative architectures is evaluated. This aims to improve the performance of grasping algorithms when presented with impoverished colour (RGB) data by training models to perform simultaneous tasks such as object categorisation, saliency detection, and depth reconstruction. Secondly, a novel loss function is introduced which improves overall performance by rewarding the network to focus only on learning grasps at suitable positions. This reduces overall training times and results in better performance on fewer training examples. The last contribution analyses the problems with the most common metric used for evaluating and comparing offline performance between different grasping models and algorithms. To this end, a Gaussian method of representing ground-truth labelled grasps is put forward, which optimal grasp locations tested in a simulated grasping environment. The combination of these novel additions to generative models results in improved grasp success, accuracy, and performance on common benchmark datasets compared to previous approaches. Furthermore, the efficacy of these contributions is also tested when transferred to a physical robotic arm, demonstrating the ability to effectively grasp previously unseen 3D printed objects of varying complexity and difficulty without the need for domain adaptation. Finally, the future directions are discussed for generative convolutional models within the overall field of robotic grasping

    RGB-D Scene Representations for Prosthetic Vision

    This thesis presents a new approach to scene representation for prosthetic vision. Structurally salient information from the scene is conveyed through the prosthetic vision display. Given the low resolution and dynamic range of the display, this enables robust identification and reliable interpretation of key structural features that are missed when using standard appearance-based scene representations. Specifically, two different types of salient structure are investigated: salient edge structure, for depiction of scene shape to the user; and salient object structure, for emulation of biological attention deployment when viewing a scene. This thesis proposes and evaluates novel computer vision algorithms for extracting salient edge and salient object structure from RGB-D input. Extraction of salient edge structure from the scene is first investigated through low-level analysis of surface shape. Our approach is based on the observation that regions of irregular surface shape, such as the boundary between the wall and the floor, tend to be more informative of scene structure than uniformly shaped regions. We detect these surface irregularities through multi-scale analysis of iso-disparity contour orientations, providing a real time method that robustly identifies important scene structure. This approach is then extended by using a deep CNN to learn high level information for distinguishing salient edges from structural texture. A novel depth input encoding called the depth surface descriptor (DSD) is presented, which better captures scene geometry that corresponds to salient edges, improving the learned model. These methods provide robust detection of salient edge structure in the scene. The detection of salient object structure is first achieved by noting that salient objects often have contrasting shape from their surroundings. Contrasting shape in the depth image is captured through the proposed histogram of surface orientations (HOSO) feature. This feature is used to modulate depth and colour contrast in a saliency detection framework, improving the precision of saliency seed regions and through this the accuracy of the final detection. After this, a novel formulation of structural saliency is introduced based on the angular measure of local background enclosure (LBE). This formulation addresses fundamental limitations of depth contrast methods and is not reliant on foreground depth contrast in the scene. Saliency is instead measured through the degree to which a candidate patch exhibits foreground structure. The effectiveness of the proposed approach is evaluated through both standard datasets as well as user studies that measure the contribution of structure-based representations. Our methods are found to more effectively measure salient structure in the scene than existing methods. Our approach results in improved performance compared to standard methods during practical use of an implant display

    Integration of Action and Language Knowledge: A Roadmap for Developmental Robotics

    “This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." “Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.”This position paper proposes that the study of embodied cognitive agents, such as humanoid robots, can advance our understanding of the cognitive development of complex sensorimotor, linguistic, and social learning skills. This in turn will benefit the design of cognitive robots capable of learning to handle and manipulate objects and tools autonomously, to cooperate and communicate with other robots and humans, and to adapt their abilities to changing internal, environmental, and social conditions. Four key areas of research challenges are discussed, specifically for the issues related to the understanding of: 1) how agents learn and represent compositional actions; 2) how agents learn and represent compositional lexica; 3) the dynamics of social interaction and learning; and 4) how compositional action and language representations are integrated to bootstrap the cognitive system. The review of specific issues and progress in these areas is then translated into a practical roadmap based on a series of milestones. These milestones provide a possible set of cognitive robotics goals and test scenarios, thus acting as a research roadmap for future work on cognitive developmental robotics.Peer reviewe
