Search CORE

215 research outputs found

TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Author: Chen Xiaotong
Jenkins Odest Chadwicke
Opipari Anthony
Yu Zeren
Zhang Huijie
Zhu Jiyue
Publication venue
Publication date: 23/07/2023
Field of study

Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, like glass doors, difficult to perceive. A second challenge is that depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent surfaces due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category, such as cups, look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper explores the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose \textit{\textbf{TransNet}}, a two-stage pipeline that estimates category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects. Moreover, we use TransNet to build an autonomous transparent object manipulation system for robotic pick-and-place and pouring tasks

arXiv.org e-Print Archive

Challenges for Monocular 6D Object Pose Estimation in Robotics

Author: Bauer Dominik
García-Rodríguez José
Hönig Peter
Thalhammer Stefan
Vincze Markus
Weibel Jean-Baptiste
Publication venue
Publication date: 22/07/2023
Field of study

Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182

arXiv.org e-Print Archive

Transparent Object Tracking with Enhanced Fusion Module

Author: Blasch Erik
Garigapati Kalyan
Ling Haibin
Wei Jie
Publication venue
Publication date: 12/09/2023
Field of study

Accurate tracking of transparent objects, such as glasses, plays a critical role in many robotic tasks such as robot-assisted living. Due to the adaptive and often reflective texture of such objects, traditional tracking algorithms that rely on general-purpose learned features suffer from reduced performance. Recent research has proposed to instill transparency awareness into existing general object trackers by fusing purpose-built features. However, with the existing fusion techniques, the addition of new features causes a change in the latent space making it impossible to incorporate transparency awareness on trackers with fixed latent spaces. For example, many of the current days transformer-based trackers are fully pre-trained and are sensitive to any latent space perturbations. In this paper, we present a new feature fusion technique that integrates transparency information into a fixed feature space, enabling its use in a broader range of trackers. Our proposed fusion module, composed of a transformer encoder and an MLP module, leverages key query-based transformations to embed the transparency information into the tracking pipeline. We also present a new two-step training strategy for our fusion module to effectively merge transparency features. We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking. Our proposed method achieves competitive results with state-of-the-art trackers on TOTB, which is the largest transparent object tracking benchmark recently released. Our results and the implementation of code will be made publicly available at https://github.com/kalyan0510/TOTEM.Comment: IEEE IROS 202

arXiv.org e-Print Archive

MMRNet: Improving Reliability for Multimodal Object Detection and Segmentation for Bin Picking via Multimodal Redundancy

Author: Chen Yuhao
Gilles Maximilian
Gunraj Hayden
Meyer Robbie
Wong Alexander
Zeng E. Zhixuan
Publication venue
Publication date: 04/04/2023
Field of study

Recently, there has been tremendous interest in industry 4.0 infrastructure to address labor shortages in global supply chains. Deploying artificial intelligence-enabled robotic bin picking systems in real world has become particularly important for reducing stress and physical demands of workers while increasing speed and efficiency of warehouses. To this end, artificial intelligence-enabled robotic bin picking systems may be used to automate order picking, but with the risk of causing expensive damage during an abnormal event such as sensor failure. As such, reliability becomes a critical factor for translating artificial intelligence research to real world applications and products. In this paper, we propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet) for tackling object detection and segmentation for robotic bin picking using data from different modalities. This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment. In particular, we realize the multimodal redundancy framework with a gate fusion module and dynamic ensemble learning. Finally, we present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty. Through experiments, we demonstrate that in an event of missing modality, our system provides a much more reliable performance compared to baseline models. We also demonstrate that our MC score is a more reliability indicator for outputs during inference time compared to the model generated confidence scores that are often over-confident

arXiv.org e-Print Archive

Material perception and action : The role of material properties in object handling

Author: Ingvarsdottir Kristin
Publication venue: 'Lund University Library'
Publication date: 18/10/2021
Field of study

This dissertation is about visual perception of material properties and their role in preparation for object handling. Usually before an object is touched or picked-up we estimate its size and shape based on visual features to plan the grip size of our hand. After we have touched the object, the grip size is adjusted according to the provided haptic feedback and the object is handled safely. Similarly, we anticipate the required grip force to handle the object without slippage, based on its visual features and prior experience with similar objects. Previous studies on object handling have mostly examined object characteristics that are typical for object recognition, e.g., size, shape, weight, but in the recent years there has been a growing interest in object characteristics that are more typical to the type of material the object is made from. That said, in a series of studies we investigated the role of perceived material properties in decision-making and object handling, in which both digitally rendered materials and real objects made of different types of materials were presented to human subjects and a humanoid robot. Paper I is a reach-to-grasp study where human subjects were examined using motion capture technology. In this study, participants grasped and lifted paper cups that varied in appearance (i.e., matte vs. glossy) and weight. Here we were interested in both the temporal and spatial components of prehension to examine the role of material properties in grip preparation, and how visual features contribute to inferred hardness before haptic feedback has become available. We found the temporal and spatial components were not exclusively governed by the expected weight of the paper cups, instead glossiness and expected hardness has a significant role as well. In paper II, which is a follow-up on Paper I, we investigated the grip force component of prehension using the same experimental stimuli as used in paper I. In a similar experimental set up, using force sensors we examined the early grip force magnitudes applied by human subjects when grasping and lifting the same paper cups as used in Paper I. Here we found that early grip force scaling was not only guided by the object weight, but the visual characteristics of the material (i.e., matte vs. glossy) had a role as well. Moreover, the results suggest that grip force scaling during the initial object lifts is guided by expected hardness that is to some extend based on visual material properties. Paper III is a visual judgment task where psychophysical measurements were used to examine how the material properties, roughness and glossiness, influence perceived bounce height and consequently perceived hardness. In a paired-comparison task, human subjects observed a bouncing ball bounce on various surface planes and judged their bounce height. Here we investigated, what combination of surface properties, i.e., roughness or glossiness, makes a surface plane to be perceived bounceable. The results demonstrate that surface planes with rough properties are believed to afford higher bounce heights for the bouncing ball, compared to surface planes with smooth properties. Interestingly, adding shiny properties to the rough and smooth surface planes, reduced the judged difference, as if surface planes with gloss are believed to afford higher bounce heights irrespective of how smooth or rough the surface plane is beneath. This suggests that perceived bounce height involves not only the physical elements of the bounce height, but also the visual characteristics of the material properties of the surface planes the ball bounces on. In paper IV we investigated the development of material knowledge using a robotic system. A humanoid robot explored real objects made of different types of materials, using both camera and haptic systems. The objects varied in visual appearances (e.g., texture, color, shape, size), weight, and hardness, and in two experiments, the robot picked up and placed the experimental objects several times using its arm. Here we used the haptic signals from the servos controlling the arm and the shoulder of the robot, to obtain measurements of the weight and hardness of the objects, and the camera system to collect data on the visual features of the objects. After the robot had repeatedly explored the objects, an associative learning model was created based on the training data to demonstrate how the robotic system could produce multi-modal mapping between the visual and haptic features of the objects. In sum, in this thesis we show that visual material properties and prior knowledge of how materials look like and behave like has a significant role in action planning

Lund University Publications

Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

Author: Agarwal Arpit
Ajith Abhiroop
Chen Matthew
Johnson Micah K.
Miller Brian
Rincon Jose Luis Susa
Rosca Justinian
Stryzheus Veniamin
Wen Chengtao
Yuan Wenzhen
Publication venue
Publication date: 08/09/2023
Field of study

In manufacturing processes, surface inspection is a key requirement for quality assessment and damage localization. Due to this, automated surface anomaly detection has become a promising area of research in various industrial inspection systems. A particular challenge in industries with large-scale components, like aircraft and heavy machinery, is inspecting large parts with very small defect dimensions. Moreover, these parts can be of curved shapes. To address this challenge, we present a 2-stage multi-modal inspection pipeline with visual and tactile sensing. Our approach combines the best of both visual and tactile sensing by identifying and localizing defects using a global view (vision) and using the localized area for tactile scanning for identifying remaining defects. To benchmark our approach, we propose a novel real-world dataset with multiple metallic defect types per image, collected in the production environments on real aerospace manufacturing parts, as well as online robot experiments in two environments. Our approach is able to identify 85% defects using Stage I and identify 100% defects after Stage II. The dataset is publicly available at https://zenodo.org/record/8327713Comment: This is a pre-print for International Conference on Intelligent Robots and Systems 2023 publicatio

arXiv.org e-Print Archive

Generation of GelSight Tactile Images for Sim2Real Learning

Author: Gomes Daniel Fernandes
Luo Shan
Paoletti Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/01/2021
Field of study

Most current works in Sim2Real learning for robotic manipulation tasks leverage camera vision that may be significantly occluded by robot hands during the manipulation. Tactile sensing offers complementary information to vision and can compensate for the information loss caused by the occlusion. However, the use of tactile sensing is restricted in the Sim2Real research due to no simulated tactile sensors being available. To mitigate the gap, we introduce a novel approach for simulating a GelSight tactile sensor in the commonly used Gazebo simulator. Similar to the real GelSight sensor, the simulated sensor can produce high-resolution images by an optical sensor from the interaction between the touched object and an opaque soft membrane. It can indirectly sense forces, geometry, texture and other properties of the object and enables Sim2Real learning with tactile sensing. Preliminary experimental results have shown that the simulated sensor could generate realistic outputs similar to the ones captured by a real GelSight sensor. All the materials used in this paper are available at https://danfergo.github.io/gelsight-simulation

arXiv.org e-Print Archive

University of Liverpool Repository

Beyond Flat GelSight Sensors: Simulation of Optical Tactile Sensors of Complex Morphologies for Sim2Real Learning

Author: Gomes Daniel
Luo Shan
Paoletti Paolo
Publication venue
Publication date: 01/01/2023
Field of study

King's Research Portal