46,352 research outputs found

    A Deep Learning Approach for Hand Posture Recognition from Depth Data

    Get PDF
    International audienceGiven the success of convolutional neural networks (CNNs) during recent years in numerous object recognition tasks, it seems logical to further extend their applicability to the treatment of three-dimensional data such as point clouds provided by depth sensors. To this end, we present an approach exploiting the CNN's ability of automated feature generation and combine it with a novel 3D feature computation technique , preserving local information contained in the data. Experiments are conducted on a large data set of 600.000 samples of hand postures obtained via ToF (time-of-flight) sensors from 20 different persons, after an extensive parameter search in order to optimize network structure. Generalization performance, measured by a leave-one-person-out scheme, exceeds that of any other method presented for this specific task, bringing the error for some persons down to 1.5%

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Driver Distraction Identification with an Ensemble of Convolutional Neural Networks

    Full text link
    The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0949

    Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

    Full text link
    This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57\% classification accuracy and ranked 2nd2^{nd} place in this challenge but was very close to the best performance even though we only used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633

    Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation

    Get PDF
    Small sample dataset and two-dimensional (2D) approach are challenges to vision-based abnormal gait behaviour recognition (AGBR). The lack of three-dimensional (3D) structure of the human body causes 2D based methods to be limited in abnormal gait virtual sample generation (VSG). In this paper, 3D AGBR based on VSG and multi-set canonical correlation analysis (3D-AGRBMCCA) is proposed. First, the unstructured point cloud data of gait are obtained by using a structured light sensor. A 3D parametric body model is then deformed to fit the point cloud data, both in shape and posture. The features of point cloud data are then converted to a high-level structured representation of the body. The parametric body model is used for VSG based on the estimated body pose and shape data. Symmetry virtual samples, pose-perturbation virtual samples and various body-shape virtual samples with multi-views are generated to extend the training samples. The spatial-temporal features of the abnormal gait behaviour from different views, body pose and shape parameters are then extracted by convolutional neural network based Long Short-Term Memory model network. These are projected onto a uniform pattern space using deep learning based multi-set canonical correlation analysis. Experiments on four publicly available datasets show the proposed system performs well under various conditions
    • …
    corecore