215 research outputs found

    TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

    Full text link
    Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, like glass doors, difficult to perceive. A second challenge is that depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent surfaces due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category, such as cups, look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper explores the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose \textit{\textbf{TransNet}}, a two-stage pipeline that estimates category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects. Moreover, we use TransNet to build an autonomous transparent object manipulation system for robotic pick-and-place and pouring tasks

    Challenges for Monocular 6D Object Pose Estimation in Robotics

    Full text link
    Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182

    Transparent Object Tracking with Enhanced Fusion Module

    Full text link
    Accurate tracking of transparent objects, such as glasses, plays a critical role in many robotic tasks such as robot-assisted living. Due to the adaptive and often reflective texture of such objects, traditional tracking algorithms that rely on general-purpose learned features suffer from reduced performance. Recent research has proposed to instill transparency awareness into existing general object trackers by fusing purpose-built features. However, with the existing fusion techniques, the addition of new features causes a change in the latent space making it impossible to incorporate transparency awareness on trackers with fixed latent spaces. For example, many of the current days transformer-based trackers are fully pre-trained and are sensitive to any latent space perturbations. In this paper, we present a new feature fusion technique that integrates transparency information into a fixed feature space, enabling its use in a broader range of trackers. Our proposed fusion module, composed of a transformer encoder and an MLP module, leverages key query-based transformations to embed the transparency information into the tracking pipeline. We also present a new two-step training strategy for our fusion module to effectively merge transparency features. We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking. Our proposed method achieves competitive results with state-of-the-art trackers on TOTB, which is the largest transparent object tracking benchmark recently released. Our results and the implementation of code will be made publicly available at https://github.com/kalyan0510/TOTEM.Comment: IEEE IROS 202

    MMRNet: Improving Reliability for Multimodal Object Detection and Segmentation for Bin Picking via Multimodal Redundancy

    Full text link
    Recently, there has been tremendous interest in industry 4.0 infrastructure to address labor shortages in global supply chains. Deploying artificial intelligence-enabled robotic bin picking systems in real world has become particularly important for reducing stress and physical demands of workers while increasing speed and efficiency of warehouses. To this end, artificial intelligence-enabled robotic bin picking systems may be used to automate order picking, but with the risk of causing expensive damage during an abnormal event such as sensor failure. As such, reliability becomes a critical factor for translating artificial intelligence research to real world applications and products. In this paper, we propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet) for tackling object detection and segmentation for robotic bin picking using data from different modalities. This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment. In particular, we realize the multimodal redundancy framework with a gate fusion module and dynamic ensemble learning. Finally, we present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty. Through experiments, we demonstrate that in an event of missing modality, our system provides a much more reliable performance compared to baseline models. We also demonstrate that our MC score is a more reliability indicator for outputs during inference time compared to the model generated confidence scores that are often over-confident

    Material perception and action : The role of material properties in object handling

    Get PDF
    This dissertation is about visual perception of material properties and their role in preparation for object handling. Usually before an object is touched or picked-up we estimate its size and shape based on visual features to plan the grip size of our hand. After we have touched the object, the grip size is adjusted according to the provided haptic feedback and the object is handled safely. Similarly, we anticipate the required grip force to handle the object without slippage, based on its visual features and prior experience with similar objects. Previous studies on object handling have mostly examined object characteristics that are typical for object recognition, e.g., size, shape, weight, but in the recent years there has been a growing interest in object characteristics that are more typical to the type of material the object is made from. That said, in a series of studies we investigated the role of perceived material properties in decision-making and object handling, in which both digitally rendered materials and real objects made of different types of materials were presented to human subjects and a humanoid robot. Paper I is a reach-to-grasp study where human subjects were examined using motion capture technology. In this study, participants grasped and lifted paper cups that varied in appearance (i.e., matte vs. glossy) and weight. Here we were interested in both the temporal and spatial components of prehension to examine the role of material properties in grip preparation, and how visual features contribute to inferred hardness before haptic feedback has become available. We found the temporal and spatial components were not exclusively governed by the expected weight of the paper cups, instead glossiness and expected hardness has a significant role as well. In paper II, which is a follow-up on Paper I, we investigated the grip force component of prehension using the same experimental stimuli as used in paper I. In a similar experimental set up, using force sensors we examined the early grip force magnitudes applied by human subjects when grasping and lifting the same paper cups as used in Paper I. Here we found that early grip force scaling was not only guided by the object weight, but the visual characteristics of the material (i.e., matte vs. glossy) had a role as well. Moreover, the results suggest that grip force scaling during the initial object lifts is guided by expected hardness that is to some extend based on visual material properties. Paper III is a visual judgment task where psychophysical measurements were used to examine how the material properties, roughness and glossiness, influence perceived bounce height and consequently perceived hardness. In a paired-comparison task, human subjects observed a bouncing ball bounce on various surface planes and judged their bounce height. Here we investigated, what combination of surface properties, i.e., roughness or glossiness, makes a surface plane to be perceived bounceable. The results demonstrate that surface planes with rough properties are believed to afford higher bounce heights for the bouncing ball, compared to surface planes with smooth properties. Interestingly, adding shiny properties to the rough and smooth surface planes, reduced the judged difference, as if surface planes with gloss are believed to afford higher bounce heights irrespective of how smooth or rough the surface plane is beneath. This suggests that perceived bounce height involves not only the physical elements of the bounce height, but also the visual characteristics of the material properties of the surface planes the ball bounces on. In paper IV we investigated the development of material knowledge using a robotic system. A humanoid robot explored real objects made of different types of materials, using both camera and haptic systems. The objects varied in visual appearances (e.g., texture, color, shape, size), weight, and hardness, and in two experiments, the robot picked up and placed the experimental objects several times using its arm. Here we used the haptic signals from the servos controlling the arm and the shoulder of the robot, to obtain measurements of the weight and hardness of the objects, and the camera system to collect data on the visual features of the objects. After the robot had repeatedly explored the objects, an associative learning model was created based on the training data to demonstrate how the robotic system could produce multi-modal mapping between the visual and haptic features of the objects. In sum, in this thesis we show that visual material properties and prior knowledge of how materials look like and behave like has a significant role in action planning

    Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

    Full text link
    In manufacturing processes, surface inspection is a key requirement for quality assessment and damage localization. Due to this, automated surface anomaly detection has become a promising area of research in various industrial inspection systems. A particular challenge in industries with large-scale components, like aircraft and heavy machinery, is inspecting large parts with very small defect dimensions. Moreover, these parts can be of curved shapes. To address this challenge, we present a 2-stage multi-modal inspection pipeline with visual and tactile sensing. Our approach combines the best of both visual and tactile sensing by identifying and localizing defects using a global view (vision) and using the localized area for tactile scanning for identifying remaining defects. To benchmark our approach, we propose a novel real-world dataset with multiple metallic defect types per image, collected in the production environments on real aerospace manufacturing parts, as well as online robot experiments in two environments. Our approach is able to identify 85% defects using Stage I and identify 100% defects after Stage II. The dataset is publicly available at https://zenodo.org/record/8327713Comment: This is a pre-print for International Conference on Intelligent Robots and Systems 2023 publicatio

    Generation of GelSight Tactile Images for Sim2Real Learning

    Get PDF
    Most current works in Sim2Real learning for robotic manipulation tasks leverage camera vision that may be significantly occluded by robot hands during the manipulation. Tactile sensing offers complementary information to vision and can compensate for the information loss caused by the occlusion. However, the use of tactile sensing is restricted in the Sim2Real research due to no simulated tactile sensors being available. To mitigate the gap, we introduce a novel approach for simulating a GelSight tactile sensor in the commonly used Gazebo simulator. Similar to the real GelSight sensor, the simulated sensor can produce high-resolution images by an optical sensor from the interaction between the touched object and an opaque soft membrane. It can indirectly sense forces, geometry, texture and other properties of the object and enables Sim2Real learning with tactile sensing. Preliminary experimental results have shown that the simulated sensor could generate realistic outputs similar to the ones captured by a real GelSight sensor. All the materials used in this paper are available at https://danfergo.github.io/gelsight-simulation
    • …
    corecore