3,518 research outputs found

    Sign Language Fingerspelling Classification from Depth and Color Images using a Deep Belief Network

    Full text link
    Automatic sign language recognition is an open problem that has received a lot of attention recently, not only because of its usefulness to signers, but also due to the numerous applications a sign classifier can have. In this article, we present a new feature extraction technique for hand pose recognition using depth and intensity images captured from a Microsoft Kinect sensor. We applied our technique to American Sign Language fingerspelling classification using a Deep Belief Network, for which our feature extraction technique is tailored. We evaluated our results on a multi-user data set with two scenarios: one with all known users and one with an unseen user. We achieved 99% recall and precision on the first, and 77% recall and 79% precision on the second. Our method is also capable of real-time sign classification and is adaptive to any environment or lightning intensity.Comment: Published in 2014 Canadian Conference on Computer and Robot Visio

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    An image processing technique for the improvement of Sign2 using a dual camera approach

    Get PDF
    A non-intrusive translation system to transform American Sign Language to digital text forms the pivotal point of discussion in the following thesis. With so many techniques which are being introduced for the same purpose in the present technological arena, this study lays claim to that relatively less trodden path of developing an unobtrusive, user-friendly and straightforward solution. The phase 1 of the Sign2 Project dealt with a single camera approach to achieve the same end of creating a translation system and my present investigation endeavors to develop a solution to improve the accuracy of results employing the methodology pursued in the Phase1 of the project. The present study is restricted to spelling out the American Sign Language alphabet and hence the only area of concentration would be the hand of the subject. This is as opposed to considering the entire ASL vocabulary which involves a more complex range of physical movement and intricate gesticulation. This investigation involved 3 subjects signing the ASL alphabet repetitively which were later used as a reference to recognize the letters in the words signed by the same subjects. Though the subject matter of this study does not differ by much from the Phase 1, the employment of an additional camera as a means to achieve better accuracy in results has been employed. The reasoning behind this approach is to attempt a closer imitation of the human depth perception. The best and most convincing information about the three dimensional world is attained by binocular vision and this theory is exploited in the current approach. For the purpose of this study, a humble attempt to come closer to the concept of binocular vision is made and only one aspect, that of the binocular disparity, is attempted to be emulated. The inference drawn from this analysis has proven the improved precision with which the ‘fist’ letters were identified. Owing to the fewer number of subjects and technical snags, the comprehensive body of data has been deprived to an extent but this thesis promises to deliver a basic foundation on which to build the future study and lays the guidelines to achieve a more complete and successful translation system

    Hand gesture recognition using Kinect.

    Get PDF
    Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this thesis. a novel method for contact-less HGR using Microsoft Kinect for Xbox is described, and a real-time HCR system is implemented with Microsoft Visual Studio 2010. Two different scenarios for HGR are provided: the Popular Gesture with nine gestures, and the Numbers with nine gestures. The system allows the users to select a scenario, and it is able to detect hand gestures made by users. to identify fingers, and to recognize the meanings of gestures, and to display the meanings and pictures on screen. The accuracy of the HGR system is from 84% to 99% with single hand gestures, and from 90% to 100% if both hands perform the same gesture at the same time. Because the depth sensor of Kinect is an infrared camera, the lighting conditions. signers\u27 skin colors and clothing, and background have little impact on the performance of this system. The accuracy and the robustness make this system a versatile component that can be integrated in a variety of applications in daily life
    • …
    corecore