2,615 research outputs found

    Time-Contrastive Networks: Self-Supervised Learning from Video

    Full text link
    We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

    Semi-Autonomous Control of an Exoskeleton using Computer Vision

    Get PDF

    Robotic Crop Interaction in Agriculture for Soft Fruit Harvesting

    Get PDF
    Autonomous tree crop harvesting has been a seemingly attainable, but elusive, robotics goal for the past several decades. Limiting grower reliance on uncertain seasonal labour is an economic driver of this, but the ability of robotic systems to treat each plant individually also has environmental benefits, such as reduced emissions and fertiliser use. Over the same time period, effective grasping and manipulation (G&M) solutions to warehouse product handling, and more general robotic interaction, have been demonstrated. Despite research progress in general robotic interaction and harvesting of some specific crop types, a commercially successful robotic harvester has yet to be demonstrated. Most crop varieties, including soft-skinned fruit, have not yet been addressed. Soft fruit, such as plums, present problems for many of the techniques employed for their more robust relatives and require special focus when developing autonomous harvesters. Adapting existing robotics tools and techniques to new fruit types, including soft skinned varieties, is not well explored. This thesis aims to bridge that gap by examining the challenges of autonomous crop interaction for the harvesting of soft fruit. Aspects which are known to be challenging include mixed obstacle planning with both hard and soft obstacles present, poor outdoor sensing conditions, and the lack of proven picking motion strategies. Positioning an actuator for harvesting requires solving these problems and others specific to soft skinned fruit. Doing so effectively means addressing these in the sensing, planning and actuation areas of a robotic system. Such areas are also highly interdependent for grasping and manipulation tasks, so solutions need to be developed at the system level. In this thesis, soft robotics actuators, with simplifying assumptions about hard obstacle planes, are used to solve mixed obstacle planning. Persistent target tracking and filtering is used to overcome challenging object detection conditions, while multiple stages of object detection are applied to refine these initial position estimates. Several picking motions are developed and tested for plums, with varying degrees of effectiveness. These various techniques are integrated into a prototype system which is validated in lab testing and extensive field trials on a commercial plum crop. Key contributions of this thesis include I. The examination of grasping & manipulation tools, algorithms, techniques and challenges for harvesting soft skinned fruit II. Design, development and field-trial evaluation of a harvester prototype to validate these concepts in practice, with specific design studies of the gripper type, object detector architecture and picking motion for this III. Investigation of specific G&M module improvements including: o Application of the autocovariance least squares (ALS) method to noise covariance matrix estimation for visual servoing tasks, where both simulated and real experiments demonstrated a 30% improvement in state estimation error using this technique. o Theory and experimentation showing that a single range measurement is sufficient for disambiguating scene scale in monocular depth estimation for some datasets. o Preliminary investigations of stochastic object completion and sampling for grasping, active perception for visual servoing based harvesting, and multi-stage fruit localisation from RGB-Depth data. Several field trials were carried out with the plum harvesting prototype. Testing on an unmodified commercial plum crop, in all weather conditions, showed promising results with a harvest success rate of 42%. While a significant gap between prototype performance and commercial viability remains, the use of soft robotics with carefully chosen sensing and planning approaches allows for robust grasping & manipulation under challenging conditions, with both hard and soft obstacles

    Generative and predictive models for robust manipulation

    Get PDF
    Probabilistic modelling of manipulation skills, perception and uncertainty pose many challenges at different stages of a typical robot manipulation pipeline. This thesis is about devising algorithms and strategies for improving robustness in object manipulation skills acquired from demonstration and derived from learnt physical models in non-prehensile tasks such as pushing. Manipulation skills can be made robust in different ways: first by improving time performance for grasp synthesis, second by employing active perceptual strategies that exploit generated grasp action hypothesis to more efficiently gather task-relevant information for grasp generation, and finally via exploiting predictive uncertainty in learnt physical models. Hence, robust manipulation skills emerge from the interplay of a triad of capabilities: generative modelling for action synthesis, active perception, and finally learning and exploiting uncertainty in physical interactions. This thesis addresses these problems by • Showing how parametric models for approximating multimodal distributions can be used as a computationally faster method for generative grasp synthesis. • Exploiting generative methods for dexterous grasp synthesis and investigating how active vision strategies can be applied to improve grasp execution safety, success rate, and utilise fewer camera views of an object for grasp generation. • Outlining methods to model and exploit predictive uncertainty from learnt forward models to achieve robust, uncertainty-averse non-prehensile manipulation, such as push manipulation. In particular, the thesis: (i) presents a framework for generative grasp synthesis with applications for real-time grasp synthesis suitable for multi-fingered robot hands; (ii) describes a sensorisation method for under-actuated hands, such as the Pisa/IIT SoftHand, which allows us to deploy the aforementioned grasp synthesis framework to this type of robotic hand; (iii) provides an active vision approach for view selection that makes use of generative grasp synthesis methods to perform perceptual predictions in order to leverage grasp performance, taking into account grasp execution safety and contact information; and (iv) finally, going beyond prehensile skills, provides an approach to model and exploit predictive uncertainty from learnt physics applied to push manipulation. Experimental results are presented in simulation and on real robot platforms to validate the proposed methods

    Exploring Natural User Abstractions For Shared Perceptual Manipulator Task Modeling & Recovery

    Get PDF
    State-of-the-art domestic robot assistants are essentially autonomous mobile manipulators capable of exerting human-scale precision grasps. To maximize utility and economy, non-technical end-users would need to be nearly as efficient as trained roboticists in control and collaboration of manipulation task behaviors. However, it remains a significant challenge given that many WIMP-style tools require superficial proficiency in robotics, 3D graphics, and computer science for rapid task modeling and recovery. But research on robot-centric collaboration has garnered momentum in recent years; robots are now planning in partially observable environments that maintain geometries and semantic maps, presenting opportunities for non-experts to cooperatively control task behavior with autonomous-planning agents exploiting the knowledge. However, as autonomous systems are not immune to errors under perceptual difficulty, a human-in-the-loop is needed to bias autonomous-planning towards recovery conditions that resume the task and avoid similar errors. In this work, we explore interactive techniques allowing non-technical users to model task behaviors and perceive cooperatively with a service robot under robot-centric collaboration. We evaluate stylus and touch modalities that users can intuitively and effectively convey natural abstractions of high-level tasks, semantic revisions, and geometries about the world. Experiments are conducted with \u27pick-and-place\u27 tasks in an ideal \u27Blocks World\u27 environment using a Kinova JACO six degree-of-freedom manipulator. Possibilities for the architecture and interface are demonstrated with the following features; (1) Semantic \u27Object\u27 and \u27Location\u27 grounding that describe function and ambiguous geometries (2) Task specification with an unordered list of goal predicates, and (3) Guiding task recovery with implied scene geometries and trajectory via symmetry cues and configuration space abstraction. Empirical results from four user studies show our interface was much preferred than the control condition, demonstrating high learnability and ease-of-use that enable our non-technical participants to model complex tasks, provide effective recovery assistance, and teleoperative control

    Planning dextrous robot hand grasps from range data, using preshapes and digit trajectories

    Get PDF
    Dextrous robot hands have many degrees of freedom. This enables the manipulation of objects between the digits of the dextrous hand but makes grasp planning substantially more complex than for parallel jaw grippers. Much of the work that addresses grasp planning for dextrous hands concentrates on the selection of contact sites to optimise stability criteria and ignores the kinematics of the hand. In more complete systems, the paradigm of preshaping has emerged as dominant. However, the criteria for the formation and placement of the preshapes have not been adequately examined, and the usefulness of the systems is therefore limited to grasping simple objects for which preshapes can be formed using coarse heuristics.In this thesis a grasp metric based on stability and kinematic feasibility is introduced. The preshaping paradigm is extended to include consideration of the trajectories that the digits take during closure from preshape to final grasp. The resulting grasp family is dependent upon task requirements and is designed for a set of "ideal" object-hand configurations. The grasp family couples the degrees of freedom of the dextrous hand in an anthropomorphic manner; the resulting reduction in freedom makes the grasp planning less complex. Grasp families are fitted to real objects by optimisation of the grasp metric; this corresponds to fitting the real object-hand configuration as close to the ideal as possible. First, the preshape aperture, which defines the positions of the fingertips in the preshape, is found by optimisation of an approximation to the grasp metric (which makes simplifying assumptions about the digit trajectories and hand kinematics). Second, the full preshape kinematics and digit closure trajectories are calculated to optimise the full grasp metric.Grasps are planned on object models built from laser striper range data from two viewpoints. A surface description of the object is used to prune the space of possible contact sites and to allow the accurate estimation of normals, which is required by the grasp metric to estimate the amount of friction required. A voxel description, built by ray-casting, is used to check for collisions between the object and the robot hand using an approximation to the Euclidean distance transform.Results are shown in simulation for a 3-digit hand model, designed to be like a simplified human hand in terms of its size and functionality. There are clear extensions of the method to any dextrous hand with a single thumb opposing multiple fingers and several different hand models that could be used are described. Grasps are planned on a wide variety of curved and polyhedral object
    • …
    corecore