15 research outputs found

    Learning Dexterous Manipulation from Exemplar Object Trajectories and Pre-Grasps

    Full text link
    Learning diverse dexterous manipulation behaviors with assorted objects remains an open grand challenge. While policy learning methods offer a powerful avenue to attack this problem, they require extensive per-task engineering and algorithmic tuning. This paper seeks to escape these constraints, by developing a Pre-Grasp informed Dexterous Manipulation (PGDM) framework that generates diverse dexterous manipulation behaviors, without any task-specific reasoning or hyper-parameter tuning. At the core of PGDM is a well known robotics construct, pre-grasps (i.e. the hand-pose preparing for object interaction). This simple primitive is enough to induce efficient exploration strategies for acquiring complex dexterous manipulation behaviors. To exhaustively verify these claims, we introduce TCDM, a benchmark of 50 diverse manipulation tasks defined over multiple objects and dexterous manipulators. Tasks for TCDM are defined automatically using exemplar object trajectories from various sources (animators, human behaviors, etc.), without any per-task engineering and/or supervision. Our experiments validate that PGDM's exploration strategy, induced by a surprisingly simple ingredient (single pre-grasp pose), matches the performance of prior methods, which require expensive per-task feature/reward engineering, expert supervision, and hyper-parameter tuning. For animated visualizations, trained policies, and project code, please refer to: https://pregrasps.github.io

    MyoDex: A Generalizable Prior for Dexterous Manipulation

    Full text link
    Human dexterity is a hallmark of motor control. Our hands can rapidly synthesize new behaviors despite the complexity (multi-articular and multi-joints, with 23 joints controlled by more than 40 muscles) of musculoskeletal sensory-motor circuits. In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. Motivated by this observation, we set out to develop agents that can build upon their previous experience to quickly acquire new (previously unattainable) behaviors. Specifically, our approach leverages multi-task learning to implicitly capture task-agnostic behavioral priors (MyoDex) for human-like dexterity, using a physiologically realistic human hand model - MyoHand. We demonstrate MyoDex's effectiveness in few-shot generalization as well as positive transfer to a large repertoire of unseen dexterous manipulation tasks. Agents leveraging MyoDex can solve approximately 3x more tasks, and 4x faster in comparison to a distillation baseline. While prior work has synthesized single musculoskeletal control behaviors, MyoDex is the first generalizable manipulation prior that catalyzes the learning of dexterous physiological control across a large variety of contact-rich behaviors. We also demonstrate the effectiveness of our paradigms beyond musculoskeletal control towards the acquisition of dexterity in 24 DoF Adroit Hand. Website: https://sites.google.com/view/myodexComment: Accepted to the 40th International Conference on Machine Learning (2023

    An Unbiased Look at Datasets for Visuo-Motor Pre-Training

    Full text link
    Visual representation learning hold great promise for robotics, but is severely hampered by the scarcity and homogeneity of robotics datasets. Recent works address this problem by pre-training visual representations on large-scale but out-of-domain data (e.g., videos of egocentric interactions) and then transferring them to target robotics tasks. While the field is heavily focused on developing better pre-training algorithms, we find that dataset choice is just as important to this paradigm's success. After all, the representation can only learn the structures or priors present in the pre-training dataset. To this end, we flip the focus on algorithms, and instead conduct a dataset centric analysis of robotic pre-training. Our findings call into question some common wisdom in the field. We observe that traditional vision datasets (like ImageNet, Kinetics and 100 Days of Hands) are surprisingly competitive options for visuo-motor representation learning, and that the pre-training dataset's image distribution matters more than its size. Finally, we show that common simulation benchmarks are not a reliable proxy for real world performance and that simple regularization strategies can dramatically improve real world policy learning. https://data4robotics.github.ioComment: Accepted to CoRL 202

    Manipulate by Seeing: Creating Manipulation Controllers from Pre-Trained Representations

    Full text link
    The field of visual representation learning has seen explosive growth in the past years, but its benefits in robotics have been surprisingly limited so far. Prior work uses generic visual representations as a basis to learn (task-specific) robot action policies (e.g., via behavior cloning). While the visual representations do accelerate learning, they are primarily used to encode visual observations. Thus, action information has to be derived purely from robot data, which is expensive to collect! In this work, we present a scalable alternative where the visual representations can help directly infer robot actions. We observe that vision encoders express relationships between image observations as distances (e.g., via embedding dot product) that could be used to efficiently plan robot behavior. We operationalize this insight and develop a simple algorithm for acquiring a distance function and dynamics predictor, by fine-tuning a pre-trained representation on human collected video sequences. The final method is able to substantially outperform traditional robot learning baselines (e.g., 70% success v.s. 50% for behavior cloning on pick-place) on a suite of diverse real-world manipulation tasks. It can also generalize to novel objects, without using any robot demonstrations during train time. For visualizations of the learned policies please check: https://agi-labs.github.io/manipulate-by-seeing/.Comment: Oral Presentation at the International Conference on Computer Vision (ICCV), 202

    A framework for the identification of full-field structural dynamics using sequences of images in the presence of non-ideal operating conditions

    No full text
    Recent developments in the ability to automatically and efficiently extract natural frequencies, damping ratios, and full-field mode shapes from video of vibrating structures has great potential for reducing the resources and time required for performing experimental and operational modal analysis at very high spatial resolution. Furthermore, these techniques have the added advantage that they can be implemented remotely and in a non-contact fashion. Emerging full-field imaging techniques therefore have potential to allow the identification of the modal properties of structures in regimes that used to be challenging. For instance, these techniques suggest that the high spatial resolution structural identification could be performed on an aircraft during flight using a ground or aircraft-based imager. They also have the potential to identify the dynamics of microscopic systems. In order to realize this capability it will be necessary to develop techniques that can extract full-field structural dynamics in the presence of non-ideal operating conditions. In this work, we develop a framework for the deployment of emerging algorithms that allow the automatic extraction of high-resolution, full-field modal parameters in the presence of non-ideal operating conditions. One of the most notable non-ideal operating conditions is the rigid body motion of both the structure being measured as well as the imager performing the measurement. We demonstrate an instantiation of the framework by showing how it can be used to address, in-plane, translational, rigid body motion. The development of a frame-to-frame keypoint–based technique for identifying full-field structural dynamics in the presence of either rigid body motion is presented and demonstrated in the context of the framework for the deployment of full-field structural identification techniques in the presence of non-ideal operating conditions. It is expected that this framework will ultimately help enable the collection of full-field structural dynamics using measurement platforms including unmanned aerial vehicles, robotic telescopes, satellites, imagers mounted in high-vibration environments (seismic, industrial, harsh weather), characterization of microscopic structures, and human-carried imagers. If imager-based structural identification techniques mature to the point that they can be used in non-ideal field conditions, it could open up the possibility that the structural health monitoring community will be able to think beyond monitoring individual structures, to full-field structural integrity monitoring at the city scale