25 research outputs found

    Improving Generalization of Deep Networks for Estimating Physical Properties of Containers and Fillings

    Get PDF
    We present methods to estimate the physical properties of household containers and their fillings manipulated by humans. We use a lightweight, pre-trained convolutional neural network with coordinate attention as a backbone model of the pipelines to accurately locate the object of interest and estimate the physical properties in the CORSMAL Containers Manipulation (CCM) dataset. We address the filling type classification with audio data and then combine this information from audio with video modalities to address the filling level classification. For the container capacity, dimension, and mass estimation, we present a data augmentation and consistency measurement to alleviate the over-fitting issue in the CCM dataset caused by the limited number of containers. We augment the training data using an object-of-interest-based re-scaling that increases the variety of physical values of the containers. We then perform the consistency measurement to choose a model with low prediction variance in the same containers under different scenes, which ensures the generalization ability of the model. Our method improves the generalization ability of the models to estimate the property of the containers that were not previously seen in the training

    CONTAINER LOCALISATION AND MASS ESTIMATION WITH AN RGB-D CAMERA

    Get PDF
    In the research area of human-robot interactions, the automatic estimation of the mass of a container manipulated by a person leveraging only visual information is a challenging task. The main challenges consist of occlusions, different filling materials and lighting conditions. The mass of an object constitutes key information for the robot to correctly regulate the force required to grasp the container. We propose a single RGB-D camera-based method to locate a manipulated container and estimate its empty mass i.e., independently of the presence of the content. The method first automatically selects a number of candidate containers based on the distance with the fixed frontal view, then averages the mass predictions of a lightweight model to provide the final estimation. Results on the CORSMAL Containers Manipulation dataset show that the proposed method estimates empty container mass obtaining a score of 71.08% under different lighting or filling conditions

    Towards safe human-to-robot handovers of unknown containers

    Get PDF
    Safe human-to-robot handovers of unknown objects require accurate estimation of hand poses and object properties, such as shape, trajectory, and weight. Accurately estimating these properties requires the use of scanned 3D object models or expensive equipment, such as motion capture systems and markers, or both. However, testing handover algorithms with robots may be dangerous for the human and, when the object is an open container with liquids, for the robot. In this paper, we propose a real-to-simulation framework to develop safe human-to-robot handovers with estimations of the physical properties of unknown cups or drinking glasses and estimations of the human hands from videos of a human manipulating the container. We complete the handover in simulation, and we estimate a region that is not occluded by the hand of the human holding the container. We also quantify the safeness of the human and object in simulation. We validate the framework using public recordings of containers manipulated before a handover and show the safeness of the handover when using noisy estimates from a range of perceptual algorithms

    The CORSMAL benchmark for the prediction of the properties of containers

    Get PDF
    13 pages, 6 tables, 7 figures, Pre-print submitted to IEEE AccessAuthors' post-print accepted for publication in IEEE Access, see https://doi.org/10.1109/ACCESS.2022.3166906 . 14 pages, 6 tables, 7 figuresThe contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and estimating the filling mass with audio-visual multi-stage approaches reach up to 65% weighted average capacity and mass scores. These results show that there is still room for improvement on the design of new methods. These new methods can be ranked and compared on the individual leaderboards provided by our open framework

    Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-robot Handovers

    Get PDF
    Human-robot object handover is a key skill for the future of human-robot collaboration. CORSMAL 2020 Challenge focuses on the perception part of this problem: the robot needs to estimate the filling mass of a container held by a human. Although there are powerful methods in image processing and audio processing individually, answering such a problem requires processing data from multiple sensors together. The appearance of the container, the sound of the filling, and the depth data provide essential information. We propose a multi-modal method to predict three key indicators of the filling mass: filling type, filling level, and container capacity. These indicators are then combined to estimate the filling mass of a container. Our method obtained Top-1 overall performance among all submissions to CORSMAL 2020 Challenge on both public and private subsets while showing no evidence of overfitting. Our source code is publicly available: https://github.com/v-iashin/CORSMALComment: Code: https://github.com/v-iashin/CORSMAL Docker: https://hub.docker.com/r/iashin/corsma

    Visualizing Robot Intent for Object Handovers with Augmented Reality

    Full text link
    Humans are very skillful in communicating their intent for when and where a handover would occur. On the other hand, even the state-of-the-art robotic implementations for handovers display a general lack of communication skills. We propose visualizing the internal state and intent of robots for Human-to-Robot Handovers using Augmented Reality. Specifically, we visualize 3D models of the object and the robotic gripper to communicate the robot's estimation of where the object is and the pose that the robot intends to grasp the object. We conduct a user study with 16 participants, in which each participant handed over a cube-shaped object to the robot 12 times. Results show that visualizing robot intent using augmented reality substantially improves the subjective experience of the users for handovers and decreases the time to transfer the object. Results also indicate that the benefits of augmented reality are still present even when the robot makes errors in localizing the object.Comment: 6 pages, 4 Figures, 2 Table

    Object localisation, dimensions estimation and tracking.

    Get PDF
    PhD Theses.Localising, estimating the physical properties of, and tracking objects from audio and video signals are the base for a large variety of applications such as surveillance, search and rescue, extraction of objects’ patterns and robotic applications. These tasks are challenging due to low signal-to-noise ratio, multiple moving objects, occlusions and changes in objects’ appearance. Moreover, these tasks become more challenging when real-time performance is required and when the sensor is mounted in a moving platform such as a robot, which introduces further problems due to potentially quick sensor motions and noisy observations. In this thesis, we consider algorithms for single and multiple object tracking from static microphones and cameras, and moving cameras without relying on additional sensors or making strong assumptions about the objects or the scene; and localisation and estimation of the 3D physical properties of unseen objects. We propose an online multi-object tracker that addresses noisy observations by exploiting the confidence on object observations and also addresses the challenges of object and camera motion by introducing a real-time object motion predictor that forecasts the future location of objects with uncalibrated cameras. The proposed method enables real-time tracking by avoiding computationally expensive labelling procedures such as clustering for data association. Moreover, we propose a novel multi-view algorithm for jointly localising and estimating the 3D physical properties of objects via semantic segmentation and projective geometry without the need to use additional sensors or markers. We validate the proposed methods in three standard benchmarks, two self-collected datasets, and two real robotic applications that involve an unmanned aerial vehicle and a robotic arm. Experimental results show that the proposed methods improve existing alternatives in terms of accuracy and speed
    corecore