7 research outputs found

    RGB-D-based Action Recognition Datasets: A Survey

    Get PDF
    Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

    The KIMORE dataset: KInematic assessment of MOvement and clinical scores for remote monitoring of physical REhabilitation

    Get PDF
    The paper proposes a free dataset, available at the following link1, named KIMORE, regarding different rehabilitation exercises collected by a RGB-D sensor. Three data inputs including RGB, Depth videos and skeleton joint positions were recorded during five physical exercises, specific for low back pain and accurately selected by physicians. For each exercise, the dataset also provides a set of features, specifically defined by the physicians, and relevant to describe its scope. These features, validated with respect to a stereophotogrammetric system, can be analyzed to compute a score for the subject's performance. The dataset also contains an evaluation of the same performance provided by the clinicians, through a clinical questionnaire. The impact of KIMORE has been analyzed by comparing the output obtained by an example of rule and template-based approaches and the clinical score. The dataset presented is intended to be used as a benchmark for human movement assessment in a rehabilitation scenario in order to test the effectiveness and the reliability of different computational approaches. Unlike other existing datasets, the KIMORE merges a large heterogeneous population of 78 subjects, divided into 2 groups with 44 healthy subjects and 34 with motor dysfunctions. It provides the most clinically-relevant features and the clinical score for each exercise

    Three-Dimensional Integral Imaging for Gesture Recognition Under Occlusions

    Get PDF
    Over the last years, three-dimensional (3-D) imaging has been applied to human action and gesture recognition, usually in the form of depth maps from RGB-D sensors. An alternative which has not been explored is 3-D integral imaging, aside from a recent preliminary study which shows that it can be an effective sensory modality with some advantages over the conventional monocular imaging. Since integral imaging has also been shown to be a powerful tool in other visual tasks (e.g., object reconstruction and recognition) under challenging conditions (e.g., low illumination, occlusions), and its passive long-range operation brings benefits over active close-range devices, a natural question is whether these advantages also hold for gesture recognition. Furthermore, occlusions are present in many real-world scenarios in gesture recognition, but it is an elusive problem which has scarcely been addressed. As far as we know, this letter analyzes for the first time the potential of integral imaging for gesture recognition under occlusions, by comparing it to monocular imaging and to RGB-D sensory data. Empirical results corroborates the benefits of 3-D integral imaging for gesture recognition, mainly under occlusions

    Estudio del estado del arte de los métodos de estimación de la pose humana en 3D

    Get PDF
    El modelado 3D basado en cámaras RGBD, como por ejemplo la popular Kinect, es una disciplina de intensa actividad investigadora y cuyos resultados empiezan a consolidarse proporcionando un alto potencial desde el punto de vista de la transferencia investigadora. En este proyecto se plantea el uso de cámaras RGBD (Kinect) para el modelado 3D del cuerpo humano completo. El objetivo es extraer representaciones tridimensionales del cuerpo humano lo más precisas y versátiles posibles para la tecnología planteada. Además se plantea el análisis de la evolución temporal de las representaciones 3D. El estudio persigue múltiples aplicaciones en el campo médico, como pueden ser el análisis del crecimiento de niños o la evolución de pacientes en tratamiento dietético

    A framework for digitisation of manual manufacturing task knowledge using gaming interface technology

    Get PDF
    Intense market competition and the global skill supply crunch are hurting the manufacturing industry, which is heavily dependent on skilled labour. Companies must look for innovative ways to acquire manufacturing skills from their experts and transfer them to novices and eventually to machines to remain competitive. There is a lack of systematic processes in the manufacturing industry and research for cost-effective capture and transfer of human skills. Therefore, the aim of this research is to develop a framework for digitisation of manual manufacturing task knowledge, a major constituent of which is human skill. The proposed digitisation framework is based on the theory of human-workpiece interactions that is developed in this research. The unique aspect of the framework is the use of consumer-grade gaming interface technology to capture and record manual manufacturing tasks in digital form to enable the extraction, decoding and transfer of manufacturing knowledge constituents that are associated with the task. The framework is implemented, tested and refined using 5 case studies, including 1 toy assembly task, 2 real-life-like assembly tasks, 1 simulated assembly task and 1 real-life composite layup task. It is successfully validated based on the outcomes of the case studies and a benchmarking exercise that was conducted to evaluate its performance. This research contributes to knowledge in five main areas, namely, (1) the theory of human-workpiece interactions to decipher human behaviour in manual manufacturing tasks, (2) a cohesive and holistic framework to digitise manual manufacturing task knowledge, especially tacit knowledge such as human action and reaction skills, (3) the use of low-cost gaming interface technology to capture human actions and the effect of those actions on workpieces during a manufacturing task, (4) a new way to use hidden Markov modelling to produce digital skill models to represent human ability to perform complex tasks and (5) extraction and decoding of manufacturing knowledge constituents from the digital skill models

    ReadingAct RGB-D action dataset and human action recognition from local features

    No full text
    For general home monitoring, a system should automatically interpret people’s actions. The system should be non-intrusive, and able to deal with a cluttered background, and loose clothes. An approach based on spatio-temporal local features and a Bag-of-Words (BoW) model is proposed for single-person action recognition from combined intensity and depth images. To restore the temporal structure lost in the traditional BoW method, a dynamic time alignment technique with temporal binning is applied in this work, which has not been previously implemented in the literature for human action recognition on depth imagery. A novel human action dataset with depth data has been created using two Microsoft Kinect sensors. The ReadingAct dataset contains 20 subjects and 19 actions for a total of 2340 videos. To investigate the effect of using depth images and the proposed method, testing was conducted on three depth datasets, and the proposed method was compared to traditional Bag-of-Words methods. Results showed that the proposed method improves recognition accuracy when adding depth to the conventional intensity data, and has advantages when dealing with long actions
    corecore