77,145 research outputs found
Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI). These dynamic images are constructed from a sequence of depth maps
using bidirectional rank pooling to effectively capture the spatial-temporal
information. Such image-based representations enable us to fine-tune the
existing ConvNets models trained on image data for classification of depth
sequences, without introducing large parameters to learn. Upon the proposed
representations, a convolutional Neural networks (ConvNets) based method is
developed for gesture recognition and evaluated on the Large-scale Isolated
Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The
method achieved 55.57\% classification accuracy and ranked place in
this challenge but was very close to the best performance even though we only
used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633
Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI), for both isolated and continuous action recognition. These dynamic
images are constructed from a segmented sequence of depth maps using
hierarchical bidirectional rank pooling to effectively capture the
spatial-temporal information. Specifically, DDI exploits the dynamics of
postures over time and DDNI and DDMNI exploit the 3D structural information
captured by depth maps. Upon the proposed representations, a ConvNet based
method is developed for action recognition. The image-based representations
enable us to fine-tune the existing Convolutional Neural Network (ConvNet)
models trained on image data without training a large number of parameters from
scratch. The proposed method achieved the state-of-art results on three large
datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means
Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset
(59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22%
cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814,
arXiv:1608.0633
Gesture Recognition Using Hidden Markov Models Augmented with Active Difference Signatures
With the recent invention of depth sensors, human gesture recognition has gained significant interest in the fields of computer vision and human computer interaction. Robust gesture recognition is a difficult problem because of the spatiotemporal variations in gesture formation, subject size, subject location, image fidelity, and subject occlusion. Gesture boundary detection, or the automatic detection of the onset and offset of a gesture in a sequence of gestures, is critical toward achieving robust gesture recognition. Existing gesture recognition methods perform the task of gesture segmentation either using resting frames in a gesture sequence or by using additional information such as audio, depth images, or RGB images. This ancillary information introduces high latency in gesture segmentation and recognition, thus making it inappropriate for real time applications. This thesis proposes a novel method to recognize time-varying human gestures from continuous video streams. The proposed method passes skeleton joint information into a Hidden Markov Model augmented with active difference signatures to achieve state-of-the-art gesture segmentation and recognition.
Active body parts are used to calculate the likelihood of previously unseen data to facilitate gesture segmentation. Active difference signatures are used to describe temporal motion as well as static differences from a canonical resting position. Geometric features, such as joint angles, and joint topological distances are used along with active difference signatures as salient feature descriptors. These feature descriptors serve as unique signatures which identify hidden states in a Hidden Markov Model. The Hidden Markov Model is able to identify gestures in a robust fashion which is tolerant to spatiotemporal and human-to-human variation in gesture articulation.
The proposed method is evaluated on both isolated and continuous datasets. An accuracy of 80.7% is achieved on the isolated MSR3D dataset and a mean Jaccard index of 0.58 is achieved on the continuous ChaLearn dataset. Results improve upon existing gesture recognition methods, which achieve a Jaccard index of 0.43 on the ChaLearn dataset. Comprehensive experiments investigate the feature selection, parameter optimization, and algorithmic methods to help understand the contributions of the proposed method
ChaLearn Looking at People Challenge 2014: Dataset and Results
This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively
Edges detection in depth images for a gesture recognition application using a Kinect WSN
International audienceThe detection of persons in an image has been the subject of several studies. Most of these works were done on images taken by cameras in visible light (RGB). In this paper, we are interested in people contours detection on the Kinect 3D images. We investigate the application of Gradient approach and optimal filters on depth images. We also use this detection to monitor the person via her gestures. Results show that edge detection of Canny is good for people in both light condition but, the performance of Sobel algorithm was better for the images taken in the dark depths
Real-time sign language recognition using a consumer depth camera
Gesture recognition remains a very challenging task in the field of computer vision and human computer interaction (HCI). A decade ago the task seemed to be almost unsolvable with the data provided by a single RGB camera. Due to recent advances in sensing technologies, such as time-of-flight and structured light cameras, there are new data sources available, which make hand gesture recognition more feasible. In this work, we propose a highly precise method to recognize static gestures from a depth data, provided from one of the above mentioned devices. The depth images are used to derive rotation-, translation- and scale- invariant features. A multi-layered random forest (MLRF) is then trained to classify the feature vectors, which yields to the recognition of the hand signs. The training time and memory required by MLRF are much smaller, compared to a simple random forest with equivalent precision. This allows to repeat the training procedure of MLRF without significant effort. To show the advantages of our technique, we evaluate our algorithm on synthetic data, on publicly available dataset, containing 24 signs from American Sign Language(ASL) and on a new dataset, collected using recently appeared Intel Creative Gesture Camera. 1
Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition
This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatiotemporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data
Real-time motion-based hand gestures recognition from time-of-flight video
The final publication is available at Springer via http://dx.doi.org/10.1007/s11265-015-1090-5This paper presents an innovative solution based on Time-Of-Flight (TOF)
video technology to motion patterns detection for real-time dynamic hand gesture
recognition. The resulting system is able to detect motion-based hand gestures getting
as input depth images. The recognizable motion patterns are modeled on the basis of
the human arm anatomy and its degrees of freedom, generating a collection of synthetic
motion patterns that is compared with the captured input patterns in order to finally
classify the input gesture. For the evaluation of our system a significant collection of
gestures has been compiled, getting results for 3D pattern classification as well as a
comparison with the results using only 2D informatio
- …