87 research outputs found

    A Robotic System for Learning Visually-Driven Grasp Planning (Dissertation Proposal)

    Get PDF
    We use findings in machine learning, developmental psychology, and neurophysiology to guide a robotic learning system\u27s level of representation both for actions and for percepts. Visually-driven grasping is chosen as the experimental task since it has general applicability and it has been extensively researched from several perspectives. An implementation of a robotic system with a gripper, compliant instrumented wrist, arm and vision is used to test these ideas. Several sensorimotor primitives (vision segmentation and manipulatory reflexes) are implemented in this system and may be thought of as the innate perceptual and motor abilities of the system. Applying empirical learning techniques to real situations brings up such important issues as observation sparsity in high-dimensional spaces, arbitrary underlying functional forms of the reinforcement distribution and robustness to noise in exemplars. The well-established technique of non-parametric projection pursuit regression (PPR) is used to accomplish reinforcement learning by searching for projections of high-dimensional data sets that capture task invariants. We also pursue the following problem: how can we use human expertise and insight into grasping to train a system to select both appropriate hand preshapes and approaches for a wide variety of objects, and then have it verify and refine its skills through trial and error. To accomplish this learning we propose a new class of Density Adaptive reinforcement learning algorithms. These algorithms use statistical tests to identify possibly interesting regions of the attribute space in which the dynamics of the task change. They automatically concentrate the building of high resolution descriptions of the reinforcement in those areas, and build low resolution representations in regions that are either not populated in the given task or are highly uniform in outcome. Additionally, the use of any learning process generally implies failures along the way. Therefore, the mechanics of the untrained robotic system must be able to tolerate mistakes during learning and not damage itself. We address this by the use of an instrumented, compliant robot wrist that controls impact forces

    Characterizing Objects in Images using Human Context

    Get PDF
    Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context

    Aggressive Aerial Grasping using a Soft Drone with Onboard Perception

    Full text link
    Contrary to the stunning feats observed in birds of prey, aerial manipulation and grasping with flying robots still lack versatility and agility. Conventional approaches using rigid manipulators require precise positioning and are subject to large reaction forces at grasp, which limit performance at high speeds. The few reported examples of aggressive aerial grasping rely on motion capture systems, or fail to generalize across environments and grasp targets. We describe the first example of a soft aerial manipulator equipped with a fully onboard perception pipeline, capable of robustly localizing and grasping visually and morphologically varied objects. The proposed system features a novel passively closing tendon-actuated soft gripper that enables fast closure at grasp, while compensating for position errors, complying to the target-object morphology, and dampening reaction forces. The system includes an onboard perception pipeline that combines a neural-network-based semantic keypoint detector with a state-of-the-art robust 3D object pose estimator, whose estimate is further refined using a fixed-lag smoother. The resulting pose estimate is passed to a minimum-snap trajectory planner, tracked by an adaptive controller that fully compensates for the added mass of the grasped object. Finally, a finite-element-based controller determines optimal gripper configurations for grasping. Rigorous experiments confirm that our approach enables dynamic, aggressive, and versatile grasping. We demonstrate fully onboard vision-based grasps of a variety of objects, in both indoor and outdoor environments, and up to speeds of 2.0 m/s -- the fastest vision-based grasp reported in the literature. Finally, we take a major step in expanding the utility of our platform beyond stationary targets, by demonstrating motion-capture-based grasps of targets moving up to 0.3 m/s, with relative speeds up to 1.5 m/s

    Grounded Semantic Reasoning for Robotic Interaction with Real-World Objects

    Get PDF
    Robots are increasingly transitioning from specialized, single-task machines to general-purpose systems that operate in unstructured environments, such as homes, offices, and warehouses. In these real-world domains, robots need to manipulate novel objects while adapting to changes in environments and goals. Semantic knowledge, which concisely describes target domains with symbols, can potentially reveal the meaningful patterns shared between problems and environments. However, existing robots are yet to effectively reason about semantic data encoding complex relational knowledge or jointly reason about symbolic semantic data and multimodal data pertinent to robotic manipulation (e.g., object point clouds, 6-DoF poses, and attributes detected with multimodal sensing). This dissertation develops semantic reasoning frameworks capable of modeling complex semantic knowledge grounded in robot perception and action. We show that grounded semantic reasoning enables robots to more effectively perceive, model, and interact with objects in real-world environments. Specifically, this dissertation makes the following contributions: (1) a survey providing a unified view for the diversity of works in the field by formulating semantic reasoning as the integration of knowledge sources, computational frameworks, and world representations; (2) a method for predicting missing relations in large-scale knowledge graphs by leveraging type hierarchies of entities, effectively avoiding ambiguity while maintaining generalization of multi-hop reasoning patterns; (3) a method for predicting unknown properties of objects in various environmental contexts, outperforming prior knowledge graph and statistical relational learning methods due to the use of n-ary relations for modeling object properties; (4) a method for purposeful robotic grasping that accounts for a broad range of contexts (including object visual affordance, material, state, and task constraint), outperforming existing approaches in novel contexts and for unknown objects; (5) a systematic investigation into the generalization of task-oriented grasping that includes a benchmark dataset of 250k grasps, and a novel graph neural network that incorporates semantic relations into end-to-end learning of 6-DoF grasps; (6) a method for rearranging novel objects into semantically meaningful spatial structures based on high-level language instructions, more effectively capturing multi-object spatial constraints than existing pairwise spatial representations; (7) a novel planning-inspired approach that iteratively optimizes placements of partially observed objects subject to both physical constraints and semantic constraints inferred from language instructions.Ph.D

    Learning-based robotic manipulation for dynamic object handling : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mechatronic Engineering at the School of Food and Advanced Technology, Massey University, Turitea Campus, Palmerston North, New Zealand

    Get PDF
    Figures are re-used in this thesis with permission of their respective publishers or under a Creative Commons licence.Recent trends have shown that the lifecycles and production volumes of modern products are shortening. Consequently, many manufacturers subject to frequent change prefer flexible and reconfigurable production systems. Such schemes are often achieved by means of manual assembly, as conventional automated systems are perceived as lacking flexibility. Production lines that incorporate human workers are particularly common within consumer electronics and small appliances. Artificial intelligence (AI) is a possible avenue to achieve smart robotic automation in this context. In this research it is argued that a robust, autonomous object handling process plays a crucial role in future manufacturing systems that incorporate robotics—key to further closing the gap between manual and fully automated production. Novel object grasping is a difficult task, confounded by many factors including object geometry, weight distribution, friction coefficients and deformation characteristics. Sensing and actuation accuracy can also significantly impact manipulation quality. Another challenge is understanding the relationship between these factors, a specific grasping strategy, the robotic arm and the employed end-effector. Manipulation has been a central research topic within robotics for many years. Some works focus on design, i.e. specifying a gripper-object interface such that the effects of imprecise gripper placement and other confounding control-related factors are mitigated. Many universal robotic gripper designs have been considered, including 3-fingered gripper designs, anthropomorphic grippers, granular jamming end-effectors and underactuated mechanisms. While such approaches have maintained some interest, contemporary works predominantly utilise machine learning in conjunction with imaging technologies and generic force-closure end-effectors. Neural networks that utilise supervised and unsupervised learning schemes with an RGB or RGB-D input make up the bulk of publications within this field. Though many solutions have been studied, automatically generating a robust grasp configuration for objects not known a priori, remains an open-ended problem. An element of this issue relates to a lack of objective performance metrics to quantify the effectiveness of a solution—which has traditionally driven the direction of community focus by highlighting gaps in the state-of-the-art. This research employs monocular vision and deep learning to generate—and select from—a set of hypothesis grasps. A significant portion of this research relates to the process by which a final grasp is selected. Grasp synthesis is achieved by sampling the workspace using convolutional neural networks trained to recognise prospective grasp areas. Each potential pose is evaluated by the proposed method in conjunction with other input modalities—such as load-cells and an alternate perspective. To overcome human bias and build upon traditional metrics, scores are established to objectively quantify the quality of an executed grasp trial. Learning frameworks that aim to maximise for these scores are employed in the selection process to improve performance. The proposed methodology and associated metrics are empirically evaluated. A physical prototype system was constructed, employing a Dobot Magician robotic manipulator, vision enclosure, imaging system, conveyor, sensing unit and control system. Over 4,000 trials were conducted utilising 100 objects. Experimentation showed that robotic manipulation quality could be improved by 10.3% when selecting to optimise for the proposed metrics—quantified by a metric related to translational error. Trials further demonstrated a grasp success rate of 99.3% for known objects and 98.9% for objects for which a priori information is unavailable. For unknown objects, this equated to an improvement of approximately 10% relative to other similar methodologies in literature. A 5.3% reduction in grasp rate was observed when removing the metrics as selection criteria for the prototype system. The system operated at approximately 1 Hz when contemporary hardware was employed. Experimentation demonstrated that selecting a grasp pose based on the proposed metrics improved grasp rates by up to 4.6% for known objects and 2.5% for unknown objects—compared to selecting for grasp rate alone. This project was sponsored by the Richard and Mary Earle Technology Trust, the Ken and Elizabeth Powell Bursary and the Massey University Foundation. Without the financial support provided by these entities, it would not have been possible to construct the physical robotic system used for testing and experimentation. This research adds to the field of robotic manipulation, contributing to topics on grasp-induced error analysis, post-grasp error minimisation, grasp synthesis framework design and general grasp synthesis. Three journal publications and one IEEE Xplore paper have been published as a result of this research

    LEARNING VISUAL FEATURES FOR GRASP SELECTION AND CONTROL

    Get PDF
    J. J. Gibson suggested that objects in our environment can be represented by an agent in terms of the types of actions that the agent may perform on or with that object. This affordance representation allows the agent to make the connection between the perception of key properties of an object and these actions. In this dissertation, I explore the automatic construction of visual representations that are associated with components of objects that afford certain types of grasping actions. I propose that the type of grasp used on a class of objects should form the basis of these visual representations. The visual categories are driven by grasp types. A grasp type is defined as a cluster of grasp samples in the 6D hand position and orientation space relative to the object. Specifically, for each grasp type, a set of view-dependent visualoperators can be learned that match the appearance of the part of the object that is to be grasped. By focusing on object parts, as opposed to entire objects, the resulting visual operators can generalize across different object types that exhibit some similarities in 3D shape. In this dissertation, the training/testing data set is composed of a large set of example grasps made by a human teacher, and includes a set of fifty unique objects. Each grasp example consists of a stereo image pair of the object alone, a stereo image pair of the object being grasped, and information about the 3D pose of the hand relative to the object. The grasp regions in a training/testing image that correspond to locations at which certain grasp types could be applied to the object are automatically estimated. First, I show that classes of objects can beformed on the basis of how the individual objects are grasped. Second, I show that visual models based on Pair of Adjacent Segments (PAS) features can capture view-dependent similarities in object part appearance for different objects of the same class. Third, I show that these visual operators can suggest grasp types and hand locationsand orientations for novel objects in novel scenarios. Given a novel image of a novel object, the proposed algorithm matches the learned shape models to this image. A match of the shape model in a novel image is interpreted as that the corresponding component of the image affords a particular grasp action. Experimental results show that the proposed algorithm is capable of identifying the occurrence of learned grasp options in images containing novel objects

    Scaled Autonomy for Networked Humanoids

    Get PDF
    Humanoid robots have been developed with the intention of aiding in environments designed for humans. As such, the control of humanoid morphology and effectiveness of human robot interaction form the two principal research issues for deploying these robots in the real world. In this thesis work, the issue of humanoid control is coupled with human robot interaction under the framework of scaled autonomy, where the human and robot exchange levels of control depending on the environment and task at hand. This scaled autonomy is approached with control algorithms for reactive stabilization of human commands and planned trajectories that encode semantically meaningful motion preferences in a sequential convex optimization framework. The control and planning algorithms have been extensively tested in the field for robustness and system verification. The RoboCup competition provides a benchmark competition for autonomous agents that are trained with a human supervisor. The kid-sized and adult-sized humanoid robots coordinate over a noisy network in a known environment with adversarial opponents, and the software and routines in this work allowed for five consecutive championships. Furthermore, the motion planning and user interfaces developed in the work have been tested in the noisy network of the DARPA Robotics Challenge (DRC) Trials and Finals in an unknown environment. Overall, the ability to extend simplified locomotion models to aid in semi-autonomous manipulation allows untrained humans to operate complex, high dimensional robots. This represents another step in the path to deploying humanoids in the real world, based on the low dimensional motion abstractions and proven performance in real world tasks like RoboCup and the DRC
    • …
    corecore