2,957 research outputs found

    Flexible human-robot cooperation models for assisted shop-floor tasks

    Get PDF
    The Industry 4.0 paradigm emphasizes the crucial benefits that collaborative robots, i.e., robots able to work alongside and together with humans, could bring to the whole production process. In this context, an enabling technology yet unreached is the design of flexible robots able to deal at all levels with humans' intrinsic variability, which is not only a necessary element for a comfortable working experience for the person but also a precious capability for efficiently dealing with unexpected events. In this paper, a sensing, representation, planning and control architecture for flexible human-robot cooperation, referred to as FlexHRC, is proposed. FlexHRC relies on wearable sensors for human action recognition, AND/OR graphs for the representation of and reasoning upon cooperation models, and a Task Priority framework to decouple action planning from robot motion planning and control.Comment: Submitted to Mechatronics (Elsevier

    Detection of bimanual gestures everywhere: why it matters, what we need and what is missing

    Full text link
    Bimanual gestures are of the utmost importance for the study of motor coordination in humans and in everyday activities. A reliable detection of bimanual gestures in unconstrained environments is fundamental for their clinical study and to assess common activities of daily living. This paper investigates techniques for a reliable, unconstrained detection and classification of bimanual gestures. It assumes the availability of inertial data originating from the two hands/arms, builds upon a previously developed technique for gesture modelling based on Gaussian Mixture Modelling (GMM) and Gaussian Mixture Regression (GMR), and compares different modelling and classification techniques, which are based on a number of assumptions inspired by literature about how bimanual gestures are represented and modelled in the brain. Experiments show results related to 5 everyday bimanual activities, which have been selected on the basis of three main parameters: (not) constraining the two hands by a physical tool, (not) requiring a specific sequence of single-hand gestures, being recursive (or not). In the best performing combination of modeling approach and classification technique, five out of five activities are recognized up to an accuracy of 97%, a precision of 82% and a level of recall of 100%.Comment: Submitted to Robotics and Autonomous Systems (Elsevier

    RGB-D-based Action Recognition Datasets: A Survey

    Get PDF
    Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

    End-to-End Learning of Representations for Asynchronous Event-Based Data

    Full text link
    Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201
    corecore