1,501 research outputs found
A novel integrated method of detection-grasping for specific object based on the box coordinate matching
To better care for the elderly and disabled, it is essential for service
robots to have an effective fusion method of object detection and grasp
estimation. However, limited research has been observed on the combination of
object detection and grasp estimation. To overcome this technical difficulty, a
novel integrated method of detection-grasping for specific object based on the
box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance
segmentation model is improved by adding channel attention module (CAM) and
spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP)
and CAM are added to the generative residual convolutional neural network
(GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping
integrated algorithm based on box coordinate matching (DG-BCM) is proposed to
obtain the fusion model of object detection and grasp estimation. For
verification, experiments on object detection and grasp estimation are
conducted separately to verify the superiority of improved models.
Additionally, grasping tasks for several specific objects are implemented on a
simulation platform, demonstrating the feasibility and effectiveness of DG-BCM
algorithm proposed in this paper
The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots
Deep networks have brought significant advances in robot perception, enabling
to improve the capabilities of robots in several visual tasks, ranging from
object detection and recognition to pose estimation, semantic scene
segmentation and many others. Still, most approaches typically address visual
tasks in isolation, resulting in overspecialized models which achieve strong
performances in specific applications but work poorly in other (often related)
tasks. This is clearly sub-optimal for a robot which is often required to
perform simultaneously multiple visual recognition tasks in order to properly
act and interact with the environment. This problem is exacerbated by the
limited computational and memory resources typically available onboard to a
robotic platform. The problem of learning flexible models which can handle
multiple tasks in a lightweight manner has recently gained attention in the
computer vision community and benchmarks supporting this research have been
proposed. In this work we study this problem in the robot vision context,
proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art
algorithms in this novel challenging scenario. We also define a new evaluation
protocol, better suited to the robot vision setting. Results shed light on the
strengths and weaknesses of existing approaches and on open issues, suggesting
directions for future research.Comment: This work has been submitted to IROS/RAL 201
Vision-Based Intelligent Robot Grasping Using Sparse Neural Network
In the modern era of Deep Learning, network parameters play a vital role in
models efficiency but it has its own limitations like extensive computations
and memory requirements, which may not be suitable for real time intelligent
robot grasping tasks. Current research focuses on how the model efficiency can
be maintained by introducing sparsity but without compromising accuracy of the
model in the robot grasping domain. More specifically, in this research two
light-weighted neural networks have been introduced, namely Sparse-GRConvNet
and Sparse-GINNet, which leverage sparsity in the robotic grasping domain for
grasp pose generation by integrating the Edge-PopUp algorithm. This algorithm
facilitates the identification of the top K% of edges by considering their
respective score values. Both the Sparse-GRConvNet and Sparse-GINNet models are
designed to generate high-quality grasp poses in real-time at every pixel
location, enabling robots to effectively manipulate unfamiliar objects. We
extensively trained our models using two benchmark datasets: Cornell Grasping
Dataset (CGD) and Jacquard Grasping Dataset (JGD). Both Sparse-GRConvNet and
Sparse-GINNet models outperform the current state-of-the-art methods in terms
of performance, achieving an impressive accuracy of 97.75% with only 10% of the
weight of GR-ConvNet and 50% of the weight of GI-NNet, respectively, on CGD.
Additionally, Sparse-GRConvNet achieve an accuracy of 85.77% with 30% of the
weight of GR-ConvNet and Sparse-GINNet achieve an accuracy of 81.11% with 10%
of the weight of GI-NNet on JGD. To validate the performance of our proposed
models, we conducted extensive experiments using the Anukul (Baxter) hardware
cobot
Learning Multi-step Robotic Manipulation Tasks through Visual Planning
Multi-step manipulation tasks in unstructured environments are extremely challenging for a robot to learn. Such tasks interlace high-level reasoning that consists of the expected states that can be attained to achieve an overall task and low-level reasoning that decides what actions will yield these states. A model-free deep reinforcement learning method is proposed to learn multi-step manipulation tasks. This work introduces a novel Generative Residual Convolutional Neural Network (GR-ConvNet) model that can generate robust antipodal grasps from n-channel image input at real-time speeds (20ms). The proposed model architecture achieved a state-of-the-art accuracy on three standard grasping datasets. The adaptability of the proposed approach is demonstrated by directly transferring the trained model to a 7 DoF robotic manipulator with a grasp success rate of 95.4% and 93.0% on novel household and adversarial objects, respectively. A novel Robotic Manipulation Network (RoManNet) is introduced, which is a vision-based model architecture, to learn the action-value functions and predict manipulation action candidates. A Task Progress based Gaussian (TPG) reward function is defined to compute the reward based on actions that lead to successful motion primitives and progress towards the overall task goal. To balance the ratio of exploration/exploitation, this research introduces a Loss Adjusted Exploration (LAE) policy that determines actions from the action candidates according to the Boltzmann distribution of loss estimates. The effectiveness of the proposed approach is demonstrated by training RoManNet to learn several challenging multi-step robotic manipulation tasks in both simulation and real-world. Experimental results show that the proposed method outperforms the existing methods and achieves state-of-the-art performance in terms of success rate and action efficiency. The ablation studies show that TPG and LAE are especially beneficial for tasks like multiple block stacking
- …