74,955 research outputs found
Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark
This paper presents Sim-Suction, a robust object-aware suction grasp policy
for mobile manipulation platforms with dynamic camera viewpoints, designed to
pick up unknown objects from cluttered environments. Suction grasp policies
typically employ data-driven approaches, necessitating large-scale,
accurately-annotated suction grasp datasets. However, the generation of suction
grasp datasets in cluttered environments remains underexplored, leaving
uncertainties about the relationship between the object of interest and its
surroundings. To address this, we propose a benchmark synthetic dataset,
Sim-Suction-Dataset, comprising 500 cluttered environments with 3.2 million
annotated suction grasp poses. The efficient Sim-Suction-Dataset generation
process provides novel insights by combining analytical models with dynamic
physical simulations to create fast and accurate suction grasp pose
annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction
grasp poses by learning point-wise affordances from the Sim-Suction-Dataset,
leveraging the synergy of zero-shot text-to-segmentation. Real-world
experiments for picking up all objects demonstrate that Sim-Suction-Pointnet
achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1
objects (prismatic shape), cluttered level 2 objects (more complex geometry),
and cluttered mixed objects, respectively. The Sim-Suction policies outperform
state-of-the-art benchmarks tested by approximately 21% in cluttered mixed
scenes.Comment: IEEE Transactions on Robotic
PointNet++ Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds
Grasping for novel objects is important for robot manipulation in
unstructured environments. Most of current works require a grasp sampling
process to obtain grasp candidates, combined with local feature extractor using
deep learning. This pipeline is time-costly, expecially when grasp points are
sparse such as at the edge of a bowl. In this paper, we propose an end-to-end
approach to directly predict the poses, categories and scores (qualities) of
all the grasps. It takes the whole sparse point clouds as the input and
requires no sampling or search process. Moreover, to generate training data of
multi-object scene, we propose a fast multi-object grasp detection algorithm
based on Ferrari Canny metrics. A single-object dataset (79 objects from YCB
object set, 23.7k grasps) and a multi-object dataset (20k point clouds with
annotations and masks) are generated. A PointNet++ based network combined with
multi-mask loss is introduced to deal with different training points. The whole
weight size of our network is only about 11.6M, which takes about 102ms for a
whole prediction process using a GeForce 840M GPU. Our experiment shows our
work get 71.43% success rate and 91.60% completion rate, which performs better
than current state-of-art works.Comment: Accepted at the International Conference on Robotics and Automation
(ICRA) 202
EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation
In this paper, we explore the dynamic grasping of moving objects through
active pose tracking and reinforcement learning for hand-eye coordination
systems. Most existing vision-based robotic grasping methods implicitly assume
target objects are stationary or moving predictably. Performing grasping of
unpredictably moving objects presents a unique set of challenges. For example,
a pre-computed robust grasp can become unreachable or unstable as the target
object moves, and motion planning must also be adaptive. In this work, we
present a new approach, Eye-on-hAnd Reinforcement Learner (EARL), for enabling
coupled Eye-on-Hand (EoH) robotic manipulation systems to perform real-time
active pose tracking and dynamic grasping of novel objects without explicit
motion prediction. EARL readily addresses many thorny issues in automated
hand-eye coordination, including fast-tracking of 6D object pose from vision,
learning control policy for a robotic arm to track a moving object while
keeping the object in the camera's field of view, and performing dynamic
grasping. We demonstrate the effectiveness of our approach in extensive
experiments validated on multiple commercial robotic arms in both simulations
and complex real-world tasks.Comment: Presented on IROS 2023 Corresponding author Siddarth Jai
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
Combining Shape Completion and Grasp Prediction for Fast and Versatile Grasping with a Multi-Fingered Hand
Grasping objects with limited or no prior knowledge about them is a highly
relevant skill in assistive robotics. Still, in this general setting, it has
remained an open problem, especially when it comes to only partial
observability and versatile grasping with multi-fingered hands. We present a
novel, fast, and high fidelity deep learning pipeline consisting of a shape
completion module that is based on a single depth image, and followed by a
grasp predictor that is based on the predicted object shape. The shape
completion network is based on VQDIF and predicts spatial occupancy values at
arbitrary query points. As grasp predictor, we use our two-stage architecture
that first generates hand poses using an autoregressive model and then
regresses finger joint configurations per pose. Critical factors turn out to be
sufficient data realism and augmentation, as well as special attention to
difficult cases during training. Experiments on a physical robot platform
demonstrate successful grasping of a wide range of household objects based on a
depth image from a single viewpoint. The whole pipeline is fast, taking only
about 1 s for completing the object's shape (0.7 s) and generating 1000 grasps
(0.3 s).Comment: 8 pages, 10 figures, 3 tables, 1 algorithm, 2023 IEEE-RAS
International Conference on Humanoid Robots (Humanoids), Project page:
https://dlr-alr.github.io/2023-humanoids-completio
- …