5 research outputs found
RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools
Humans excel in complex long-horizon soft body manipulation tasks via
flexible tool use: bread baking requires a knife to slice the dough and a
rolling pin to flatten it. Often regarded as a hallmark of human cognition,
tool use in autonomous robots remains limited due to challenges in
understanding tool-object interactions. Here we develop an intelligent robotic
system, RoboCook, which perceives, models, and manipulates elasto-plastic
objects with various tools. RoboCook uses point cloud scene representations,
models tool-object interactions with Graph Neural Networks (GNNs), and combines
tool classification with self-supervised policy learning to devise manipulation
plans. We demonstrate that from just 20 minutes of real-world interaction data
per tool, a general-purpose robot arm can learn complex long-horizon soft
object manipulation tasks, such as making dumplings and alphabet letter
cookies. Extensive evaluations show that RoboCook substantially outperforms
state-of-the-art approaches, exhibits robustness against severe external
disturbances, and demonstrates adaptability to different materials.Comment: Project page: https://hshi74.github.io/robocook
Recommended from our members
Learning and leveraging kinematics for robot motion planning under uncertainty
Service robots that can assist humans in performing day-to-day tasks will need to be general-purpose robots that can perform a wide array of tasks without much supervision from end-users. As they will be operating in unstructured and ever-changing human environments, they will need to be capable of adapting to their work environments quickly and learning to perform novel tasks within a few trials. However, current robots fall short of these requirements as they are generally highly specialized, can only perform fixed, predefined tasks reliably, and need to operate in controlled environments. One of the main reasons behind this big gap is that the current robots require complete and accurate information about their surroundings to function effectively, whereas, in human environments, robots will only have access to limited information about their tasks and environments. With incomplete information about its surroundings, a robot using pre-programmed or pre-learned motion policies will fail to adapt to the novel situations encountered during operation and fall short in completing its tasks. Online motion generation methods that do not reason about the lack of information will not suffice either, as the developed policies may be unreliable under incomplete information. Reasoning about the lack of information becomes critical for manipulation tasks a service robot would have to perform. These tasks will often require interacting with multiple objects that make or break contacts during the task. A contact between objects can significantly alter their subsequent motion and lead to sudden transitions in their dynamics. Under these sudden transitions, even minor errors in estimating object poses can cause drastic deviations from the robot's initial motion plan for the task and lead the robot to failure in completing the tasks. Hence, service robots need methods that generate motion policies for manipulation tasks efficiently while accounting for the uncertainty due to incomplete or partial information.
Partially Observable Markov Decision Processes (POMDPs) is one such mathematical framework that can model and plan for tasks where the agent lacks complete information about the task. However, POMDPs incur exponentially increasing computational costs with planning time horizon, which restricts the current POMDP-based planning methods to problems having short time horizons. Another challenge for planning-based approaches is that they require a state transition function for the world they are operating in to develop motion plans, which may not always be available to the robot. In control theory terms, a state transition function for the world is analogous to its system plant. In this dissertation, we propose to address these challenges by developing methods that can learn state transition functions for robot manipulation tasks directly from observations and later use them to generate long-horizon motion plans to complete the task under uncertainty.
We first model the world state transition functions for robot manipulation tasks involving sudden transitions, such as due to contacts, using hybrid models and develop a novel hierarchical POMDP-planner that leverages the representational power of hybrid models to develop motion plans for long-horizon tasks under uncertainty. Next, we address the requirement of planning-based methods to have access to world state transition functions. We introduce three novel methods for learning kinematic models for articulated objects directly from observations and present an algorithm to construct the state transition functions from the learned kinematics models for manipulating these objects. We focus on learning models for articulated objects as they form one of the biggest sets of household objects that service robots will frequently interact with. The first method, MICAH, focuses on learning kinematic models for articulated objects that exhibit configuration-dependent articulation properties, such as a refrigerator door that stays closed magnetically, from unsegmented sequences of observations of object part poses. Next, we introduce ScrewNet, which removes the requirement of object pose estimation of MICAH and learns articulation properties of objects directly from raw sensory data available to the robot (depth images) without knowing their articulation model category a priori. Extending it further, we introduce DUST-net, which learns distributions over articulation model parameters for objects indicating the network's confidence over the estimated parameters directly from raw depth images. Combining these methods, in this dissertation, we introduce a unified framework that can enable a robot to learn state transition functions for manipulation tasks from observations and later use them to develop long-horizon plans even under uncertainty.Mechanical Engineerin
Recommended from our members
Improving Robotic Manipulation via Reachability, Tactile, and Spatial Awareness
Robotic grasping and manipulation remains an active area of research despite significant progress over the past decades. Many existing solutions still struggle to robustly handle difficult situations that a robot might encounter even in non-contrived settings.For example, grasping systems struggle when the object is not centrally located in the robot's workspace. Also, grasping in dynamic environments presents a unique set of challenges. A stable and feasible grasp can become infeasible as the object moves; this problem becomes pronounced when there are obstacles in the scene.
This research is inspired by the observation that object-manipulation tasks like grasping, pick-and-place or insertion require different forms of awareness. These include reachability awareness -- being aware of regions that can be reached without self-collision or collision with surrounding objects; tactile awareness-- ability to feel and grasp objects just tight enough to prevent slippage or crushing the objects; and 3D awareness -- ability to perceive size and depth in ways that makes object manipulation possible. Humans use these capabilities to achieve a high level of coordination needed for object manipulation. In this work, we develop techniques that equip robots with similar sensitivities towards realizing a reliable and capable home-assistant robot.
In this thesis we demonstrate the importance of reasoning about the robot's workspace to enable grasping systems handle more difficult settings such as picking up moving objects while avoiding surrounding obstacles. Our method encodes the notion of reachability and uses it to generate not just stable grasps but ones that are also achievable by the robot. This reachability-aware formulation effectively expands the useable workspace of the robot enabling the robot to pick up objects from difficult-to-reach locations. While recent vision-based grasping systems work reliably well achieving pickup success rate higher than 90\% in cluttered scenes, failure cases due to calibration error, slippage and occlusion were challenging. To address this, we develop a closed-loop tactile-based improvement that uses additional tactile sensing to deal with self-occlusion (a limitation of vision-based system) and adaptively tighten the robot's grip on the object-- making the grasping system tactile-aware and more reliable. This can be used as an add-on to existing grasping systems.
This adaptive tactile-based approach demonstrates the effectiveness of closed-loop feedback in the final phase of the grasping process. To achieve closed-loop manipulation all through the manipulation process, we study the value of multi-view camera systems to improve learning-based manipulation systems.
Using a multi-view Q-learning formulation, we develop a learned closed-loop manipulation algorithm for precise manipulation tasks that integrates inputs from multiple static RGB cameras to overcome self-occlusion and improve 3D understanding.
To conclude, we discuss some opportunities/ directions for future work
Grounded Semantic Reasoning for Robotic Interaction with Real-World Objects
Robots are increasingly transitioning from specialized, single-task machines to general-purpose systems that operate in unstructured environments, such as homes, offices, and warehouses. In these real-world domains, robots need to manipulate novel objects while adapting to changes in environments and goals. Semantic knowledge, which concisely describes target domains with symbols, can potentially reveal the meaningful patterns shared between problems and environments. However, existing robots are yet to effectively reason about semantic data encoding complex relational knowledge or jointly reason about symbolic semantic data and multimodal data pertinent to robotic manipulation (e.g., object point clouds, 6-DoF poses, and attributes detected with multimodal sensing).
This dissertation develops semantic reasoning frameworks capable of modeling complex semantic knowledge grounded in robot perception and action. We show that grounded semantic reasoning enables robots to more effectively perceive, model, and interact with objects in real-world environments. Specifically, this dissertation makes the following contributions: (1) a survey providing a unified view for the diversity of works in the field by formulating semantic reasoning as the integration of knowledge sources, computational frameworks, and world representations; (2) a method for predicting missing relations in large-scale knowledge graphs by leveraging type hierarchies of entities, effectively avoiding ambiguity while maintaining generalization of multi-hop reasoning patterns; (3) a method for predicting unknown properties of objects in various environmental contexts, outperforming prior knowledge graph and statistical relational learning methods due to the use of n-ary relations for modeling object properties; (4) a method for purposeful robotic grasping that accounts for a broad range of contexts (including object visual affordance, material, state, and task constraint), outperforming existing approaches in novel contexts and for unknown objects; (5) a systematic investigation into the generalization of task-oriented grasping that includes a benchmark dataset of 250k grasps, and a novel graph neural network that incorporates semantic relations into end-to-end learning of 6-DoF grasps; (6) a method for rearranging novel objects into semantically meaningful spatial structures based on high-level language instructions, more effectively capturing multi-object spatial constraints than existing pairwise spatial representations; (7) a novel planning-inspired approach that iteratively optimizes placements of partially observed objects subject to both physical constraints and semantic constraints inferred from language instructions.Ph.D