20 research outputs found

    Multi-view Human Parsing for Human-Robot Collaboration

    Get PDF
    In human-robot collaboration, perception plays a major role in enabling the robot to understand the surrounding environment and the position of humans inside the working area, which represents a key element for an effective and safe collaboration. Human pose estimators based on skeletal models are among the most popular approaches to monitor the position of humans around the robot, but they do not take into account information such as the body volume, needed by the robot for effective collision avoidance. In this paper, we propose a novel 3D human representation derived from body parts segmentation which combines high-level semantic information (i.e., human body parts) and volume information. To compute such body parts segmentation, also known as human parsing in the literature, we propose a multi-view system based on a camera network. People body parts are segmented in the frames acquired by each camera, projected into 3D world coordinates, and then aggregated to build a 3D representation of the human that is robust to occlusions. A further step of 3D data filtering has been implemented to improve robustness to outliers and segmentation accuracy. The proposed multi-view human parsing approach was tested in a real environment and its performance measured in terms of global and class accuracy on a dedicated dataset, acquired to thoroughly test the system under various conditions. The experimental results demonstrated the performance improvements that can be achieved thanks to the proposed multi-view approach

    Real-time Object Detection using Deep Learning for helping People with Visual Impairments

    Get PDF
    Object detection plays a crucial role in the development of Electronic Travel Aids (ETAs), capable to guide a person with visual impairments towards a target object in an unknown indoor environment. In such a scenario, the object detector runs on a mobile device (e.g. smartphone) and needs to be fast, accurate, and, most importantly, lightweight. Nowadays, Deep Neural Networks (DNN) have become the state-of-the-art solution for object detection tasks, with many works improving speed and accuracy by proposing new architectures or extending existing ones. A common strategy is to use deeper networks to get higher performance, but that leads to a higher computational cost which makes it impractical to integrate them on mobile devices with limited computational power. In this work we compare different object detectors to find a suitable candidate to be implemented on ETAs, focusing on lightweight models capable of working in real-time on mobile devices with a good accuracy. In particular, we select two models: SSD Lite with Mobilenet V2 and Tiny-DSOD. Both models have been tested on the popular OpenImage dataset and a new dataset, called Office dataset, collected to further test models’ performance and robustness in a real scenario inspired by the actual perception challenges of a user with visual impairments

    From Human Perception and Action Recognition to Causal Understanding of Human-Robot Interaction in Industrial Environments

    Get PDF
    Human-robot collaboration is migrating from lightweight robots in laboratory environments to industrial applications, where heavy tasks and powerful robots are more common. In this scenario, a reliable perception of the humans involved in the process and related intentions and behaviors is fundamental. This paper presents two projects investigating the use of robots in relevant industrial scenarios, providing an overview of how industrial human-robot collaborative tasks can be successfully addressed

    Supplement: "Localization and broadband follow-up of the gravitational-wave transient GW150914" (2016, ApJL, 826, L13)

    Get PDF
    This Supplement provides supporting material for Abbott et al. (2016a). We briefly summarize past electromagnetic (EM) follow-up efforts as well as the organization and policy of the current EM follow-up program. We compare the four probability sky maps produced for the gravitational-wave transient GW150914, and provide additional details of the EM follow-up observations that were performed in the different bands

    Semantic Segmentation for Flexible and Autonomous Manufacturing

    No full text
    Customized mass production of boats and other vehicles requires highly complex manufacturing processes that involve a high amount of automation. Key elements to enhance the efficiency of such systems are represented by vision and sensing, which provide robots with detailed information about the working environment. In this paper, we focus on the sanding process of boat molding tools by means of a robot, proposing the use of semantic segmentation to detect the key elements involved in production and increase the automation of the production process. We demonstrate the potential of semantic segmentation in an industrial environment which differs from the domestic scenes typically considered in the literature: it features a lower degree of variability with respect to domestic scenarios, but higher performances are required in the production environment to address challenging manufacturing operations successfully. Our segmentation algorithm has been thoroughly validated on a industrial dataset that was created on purpose, whose acquisition and annotation were speeded up thanks to our optimized pipeline

    Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration

    No full text
    Human action recognition plays a major role in enabling an effective and safe collaboration between humans and robots. Considering for example a collaborative assembly task, the human worker can use gestures to communicate with the robot while the robot can exploit the recognized actions to anticipate the next steps in the assembly process, improving safety and the overall productivity. In this work, we propose a novel framework for human action recognition based on 3D pose estimation and ensemble techniques. In such framework, we first estimate the 3D coordinates of the human hands and body joints by means of OpenPose and RGB-D data. The estimated joints are then fed to a set of graph convolutional networks derived from Shift-GCN, one network for each set of joints (i.e., body, left hand and right hand). Finally, using an ensemble approach we average the output scores of all the networks to predict the final human action. The proposed framework was evaluated on a dedicated dataset, named IAS-Lab Collaborative HAR dataset, which includes both actions and gestures commonly used in human-robot collaboration tasks. The experimental results demonstrated how the ensemble of the different action recognition models helps improving the accuracy and the robustness of the overall system

    A general skeleton-based action and gesture recognition framework for human-robot collaboration

    No full text
    Recognizing human actions is crucial for an effective and safe collaboration between humans and robots. For example, in a collaborative assembly task, human workers can use gestures to communicate with the robot, and the robot can use the recognized actions to anticipate the next steps in the assembly process, leading to improved safety and productivity. In this work, we propose a general framework for human action recognition based on 3D pose estimation and ensemble techniques, which allows to recognize both body actions and hand gestures. The framework relies on OpenPose and 2D to 3D lifting methods to estimate 3D joints for the human body and the hands, feeding then these joints into a set of graph convolutional networks based on the Shift- GCN architecture. The output scores of all networks are combined using an ensemble approach to predict the final human action. The proposed framework was evaluated on a custom dataset designed for human–robot collaboration tasks, named IAS-Lab Collaborative HAR dataset. The results showed that using an ensemble of action recognition models improves the accuracy and robustness of the overall system; moreover, the proposed framework can be easily specialized on different scenarios and achieve state-of-the-art results on the HRI30 dataset when coupled with an object detector or classifier

    Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests

    No full text
    Semantic segmentation is a problem which is getting more and more attention in the computer vision community. Nowadays, deep learning methods represent the state of the art to solve this problem, and the trend is to use deeper networks to get higher performance. The drawback with such models is a higher computational cost, which makes it difficult to integrate them on mobile robot platforms. In this work we want to explore how to obtain lighter deep learning models without compromising performance. To do so we will consider the features used in the 3D Entangled Forests algorithm and we will study the best strategies to integrate these within FuseNet deep network. Such new features allow us to shrink the network size without loosing performance, obtaining hence a lighter model which achieves state-of-the-art performance on the semantic segmentation task and represents an interesting alternative for mobile robotics applications, where computational power and energy are limited

    METRIC—Multi-Eye to Robot Indoor Calibration Dataset

    No full text
    Multi-camera systems are an effective solution for perceiving large areas or complex scenarios with many occlusions. In such a setup, an accurate camera network calibration is crucial in order to localize scene elements with respect to a single reference frame shared by all the viewpoints of the network. This is particularly important in applications such as object detection and people tracking. Multi-camera calibration is a critical requirement also in several robotics scenarios, particularly those involving a robotic workcell equipped with a manipulator surrounded by multiple sensors. Within this scenario, the robot-world hand-eye calibration is an additional crucial element for determining the exact position of each camera with respect to the robot, in order to provide information about the surrounding workspace directly to the manipulator. Despite the importance of the calibration process in the two scenarios outlined above, namely (i) a camera network, and (ii) a camera network with a robot, there is a lack of standard datasets available in the literature to evaluate and compare calibration methods. Moreover they are usually treated separately and tested on dedicated setups. In this paper, we propose a general standard dataset acquired in a robotic workcell where calibration methods can be evaluated in two use cases: camera network calibration and robot-world hand-eye calibration. The Multi-Eye To Robot Indoor Calibration (METRIC) dataset consists of over 10,000 synthetic and real images of ChAruCo and checkerboard patterns, each one rigidly attached to the robot end-effector, which was moved in front of four cameras surrounding the manipulator from different viewpoints during the image acquisition. The real images in the dataset includes several multi-view image sets captured by three different types of sensor networks: Microsoft Kinect V2, Intel RealSense Depth D455 and Intel RealSense Lidar L515, to evaluate their advantages and disadvantages for calibration. Furthermore, in order to accurately analyze the effect of camera-robot distance on calibration, we acquired a comprehensive synthetic dataset, with related ground truth, with three different camera network setups corresponding to three levels of calibration difficulty depending on the cell size. An additional contribution of this work is to provide a comprehensive evaluation of state-of-the-art calibration methods using our dataset, highlighting their strengths and weaknesses, in order to outline two benchmarks for the two aforementioned use cases

    Clustering-based refinement for 3D human body parts segmentation

    No full text
    A common approach to address human body parts segmentation on 3D data involves the use of a 2D segmentation network and 3D projection. Following this approach, several errors could be introduced in the final 3D segmentation output, such as segmentation errors and reprojection errors. Such errors are even more significant when considering very small body parts such as hands. In this paper, we propose a new algorithm that aims to reduce such errors and improve 3D segmentation of human body parts. The algorithm detects noise points and wrong clusters using DBSCAN algorithm, and changes the labels of the points exploiting the shape and position of the clusters. We evaluated the proposed algorithm on the 3DPeople synthetic dataset and on a real dataset, highlighting how it can greatly improve the 3D segmentation of small body parts like hands. With our algorithm we achieved an improvement up to 4.68% of IoU on the synthetic dataset and up to 2.30% of IoU in the real scenario
    corecore