2,359 research outputs found
A Comparative Review of Recent Kinect-based Action Recognition Algorithms
Video-based human action recognition is currently one of the most active
research areas in computer vision. Various research studies indicate that the
performance of action recognition is highly dependent on the type of features
being extracted and how the actions are represented. Since the release of the
Kinect camera, a large number of Kinect-based human action recognition
techniques have been proposed in the literature. However, there still does not
exist a thorough comparison of these Kinect-based techniques under the grouping
of feature types, such as handcrafted versus deep learning features and
depth-based versus skeleton-based features. In this paper, we analyze and
compare ten recent Kinect-based algorithms for both cross-subject action
recognition and cross-view action recognition using six benchmark datasets. In
addition, we have implemented and improved some of these techniques and
included their variants in the comparison. Our experiments show that the
majority of methods perform better on cross-subject action recognition than
cross-view action recognition, that skeleton-based features are more robust for
cross-view recognition than depth-based features, and that deep learning
features are suitable for large datasets.Comment: Accepted by the IEEE Transactions on Image Processin
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
A review of computer vision-based approaches for physical rehabilitation and assessment
The computer vision community has extensively researched the area of human motion analysis, which primarily focuses on pose estimation, activity recognition, pose or gesture recognition and so on. However for many applications, like monitoring of functional rehabilitation of patients with musculo skeletal or physical impairments, the requirement is to comparatively evaluate human motion. In this survey, we capture important literature on vision-based monitoring and physical rehabilitation that focuses on comparative evaluation of human motion during the past two decades and discuss the state of current research in this area. Unlike other reviews in this area, which are written from a clinical objective, this article presents research in this area from a computer vision application perspective. We propose our own taxonomy of computer vision-based rehabilitation and assessment research which are further divided into sub-categories to capture novelties of each research. The review discusses the challenges of this domain due to the wide ranging human motion abnormalities and difficulty in automatically assessing those abnormalities. Finally, suggestions on the future direction of research are offered
Deep Affordance-grounded Sensorimotor Object Recognition
It is well-established by cognitive neuroscience that human perception of
objects constitutes a complex process, where object appearance information is
combined with evidence about the so-called object "affordances", namely the
types of actions that humans typically perform when interacting with them. This
fact has recently motivated the "sensorimotor" approach to the challenging task
of automatic object recognition, where both information sources are fused to
improve robustness. In this work, the aforementioned paradigm is adopted,
surpassing current limitations of sensorimotor object recognition research.
Specifically, the deep learning paradigm is introduced to the problem for the
first time, developing a number of novel neuro-biologically and
neuro-physiologically inspired architectures that utilize state-of-the-art
neural networks for fusing the available information sources in multiple ways.
The proposed methods are evaluated using a large RGB-D corpus, which is
specifically collected for the task of sensorimotor object recognition and is
made publicly available. Experimental results demonstrate the utility of
affordance information to object recognition, achieving an up to 29% relative
error reduction by its inclusion.Comment: 9 pages, 7 figures, dataset link included, accepted to CVPR 201
User evaluation of an interactive learning framework for single-arm and dual-arm robots
The final publication is available at link.springer.comSocial robots are expected to adapt to their users and, like their human counterparts, learn from the interaction. In our previous work, we proposed an interactive learning framework that enables a user to intervene and modify a segment of the robot arm trajectory. The framework uses gesture teleoperation and reinforcement learning to learn new motions. In the current work, we compared the user experience with the proposed framework implemented on the single-arm and dual-arm Barrett’s 7-DOF WAM robots equipped with a Microsoft Kinect camera for user tracking and gesture recognition. User performance and workload were measured in a series of trials with two groups of 6 participants using two robot settings in different order for counterbalancing. The experimental results showed that, for the same task, users required less time and produced shorter robot trajectories with the single-arm robot than with the dual-arm robot. The results also showed that the users who performed the task with the single-arm robot first experienced considerably less workload in performing the task with the dual-arm robot while achieving a higher task success rate in a shorter time.Peer ReviewedPostprint (author's final draft
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
- …