77,145 research outputs found

    Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

    Full text link
    This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57\% classification accuracy and ranked 2nd2^{nd} place in this challenge but was very close to the best performance even though we only used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633

    Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks

    Get PDF
    This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814, arXiv:1608.0633

    Gesture Recognition Using Hidden Markov Models Augmented with Active Difference Signatures

    Get PDF
    With the recent invention of depth sensors, human gesture recognition has gained significant interest in the fields of computer vision and human computer interaction. Robust gesture recognition is a difficult problem because of the spatiotemporal variations in gesture formation, subject size, subject location, image fidelity, and subject occlusion. Gesture boundary detection, or the automatic detection of the onset and offset of a gesture in a sequence of gestures, is critical toward achieving robust gesture recognition. Existing gesture recognition methods perform the task of gesture segmentation either using resting frames in a gesture sequence or by using additional information such as audio, depth images, or RGB images. This ancillary information introduces high latency in gesture segmentation and recognition, thus making it inappropriate for real time applications. This thesis proposes a novel method to recognize time-varying human gestures from continuous video streams. The proposed method passes skeleton joint information into a Hidden Markov Model augmented with active difference signatures to achieve state-of-the-art gesture segmentation and recognition. Active body parts are used to calculate the likelihood of previously unseen data to facilitate gesture segmentation. Active difference signatures are used to describe temporal motion as well as static differences from a canonical resting position. Geometric features, such as joint angles, and joint topological distances are used along with active difference signatures as salient feature descriptors. These feature descriptors serve as unique signatures which identify hidden states in a Hidden Markov Model. The Hidden Markov Model is able to identify gestures in a robust fashion which is tolerant to spatiotemporal and human-to-human variation in gesture articulation. The proposed method is evaluated on both isolated and continuous datasets. An accuracy of 80.7% is achieved on the isolated MSR3D dataset and a mean Jaccard index of 0.58 is achieved on the continuous ChaLearn dataset. Results improve upon existing gesture recognition methods, which achieve a Jaccard index of 0.43 on the ChaLearn dataset. Comprehensive experiments investigate the feature selection, parameter optimization, and algorithmic methods to help understand the contributions of the proposed method

    ChaLearn Looking at People Challenge 2014: Dataset and Results

    Get PDF
    This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively

    Edges detection in depth images for a gesture recognition application using a Kinect WSN

    Get PDF
    International audienceThe detection of persons in an image has been the subject of several studies. Most of these works were done on images taken by cameras in visible light (RGB). In this paper, we are interested in people contours detection on the Kinect 3D images. We investigate the application of Gradient approach and optimal filters on depth images. We also use this detection to monitor the person via her gestures. Results show that edge detection of Canny is good for people in both light condition but, the performance of Sobel algorithm was better for the images taken in the dark depths

    Real-time sign language recognition using a consumer depth camera

    Get PDF
    Gesture recognition remains a very challenging task in the field of computer vision and human computer interaction (HCI). A decade ago the task seemed to be almost unsolvable with the data provided by a single RGB camera. Due to recent advances in sensing technologies, such as time-of-flight and structured light cameras, there are new data sources available, which make hand gesture recognition more feasible. In this work, we propose a highly precise method to recognize static gestures from a depth data, provided from one of the above mentioned devices. The depth images are used to derive rotation-, translation- and scale- invariant features. A multi-layered random forest (MLRF) is then trained to classify the feature vectors, which yields to the recognition of the hand signs. The training time and memory required by MLRF are much smaller, compared to a simple random forest with equivalent precision. This allows to repeat the training procedure of MLRF without significant effort. To show the advantages of our technique, we evaluate our algorithm on synthetic data, on publicly available dataset, containing 24 signs from American Sign Language(ASL) and on a new dataset, collected using recently appeared Intel Creative Gesture Camera. 1

    Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition

    Get PDF
    This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatiotemporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data

    Real-time motion-based hand gestures recognition from time-of-flight video

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11265-015-1090-5This paper presents an innovative solution based on Time-Of-Flight (TOF) video technology to motion patterns detection for real-time dynamic hand gesture recognition. The resulting system is able to detect motion-based hand gestures getting as input depth images. The recognizable motion patterns are modeled on the basis of the human arm anatomy and its degrees of freedom, generating a collection of synthetic motion patterns that is compared with the captured input patterns in order to finally classify the input gesture. For the evaluation of our system a significant collection of gestures has been compiled, getting results for 3D pattern classification as well as a comparison with the results using only 2D informatio
    • …
    corecore