2,774 research outputs found

    Sign Language Fingerspelling Classification from Depth and Color Images using a Deep Belief Network

    Full text link
    Automatic sign language recognition is an open problem that has received a lot of attention recently, not only because of its usefulness to signers, but also due to the numerous applications a sign classifier can have. In this article, we present a new feature extraction technique for hand pose recognition using depth and intensity images captured from a Microsoft Kinect sensor. We applied our technique to American Sign Language fingerspelling classification using a Deep Belief Network, for which our feature extraction technique is tailored. We evaluated our results on a multi-user data set with two scenarios: one with all known users and one with an unseen user. We achieved 99% recall and precision on the first, and 77% recall and 79% precision on the second. Our method is also capable of real-time sign classification and is adaptive to any environment or lightning intensity.Comment: Published in 2014 Canadian Conference on Computer and Robot Visio

    Hand Gesture Recognition for Sign Language Transcription

    Get PDF
    Sign Language is a language which allows mute people to communicate with other mute or non-mute people. The benefits provided by this language, however, disappear when one of the members of a group does not know Sign Language and a conversation starts using that language. In this document, I present a system that takes advantage of Convolutional Neural Networks to recognize hand letter and number gestures from American Sign Language based on depth images captured by the Kinect camera. In addition, as a byproduct of these research efforts, I collected a new dataset of depth images of American Sign Language letters and numbers, and I compared the presented method for image recognition against a similar dataset but for Vietnamese Sign Language. Finally, I present how this work supports my ideas for the future work on a complete system for Sign Language transcription

    Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

    Get PDF
    ABSTRACT Recognizing sign language is a very challenging task in computer vision. One of the more popular approaches, Dynamic Time Warping (DTW), utilizes hand trajectory information to compare a query sign with those in a database of examples. In this work, we conducted an American Sign Language (ASL) recognition experiment on Kinect sign data using DTW for sign trajectory similarity and Histogram of Oriented Gradient (HoG

    Leveraging 2D pose estimators for American Sign Language Recognition

    Get PDF
    Most deaf children born to hearing parents do not have continuous access to language, leading to weaker short-term memory compared to deaf children born to deaf parents. This lack of short-term memory has serious consequences on their mental health and employment rate. To this end, prior work has explored CopyCat, a game where children interact with virtual actors using sign language. While CopyCat has been shown to improve language generation, reception, and repetition, it uses expensive hardware for sign language recognition. This thesis explores the feasibility of using 2D off-the-shelf camera-based pose estimators such as MediaPipe for complementing sign language recognition and moving towards a ubiquitous recognition framework. We compare MediaPipe with 3D pose estimators such as Azure Kinect to determine the feasibility of using off-the-shelf cameras. Furthermore, we develop and compare Hidden Markov Models (HMMs) with state-of-the-art recognition models like Transformers to determine which model is best suited for American Sign Language Recognition in a constrained environment. We find that MediaPipe marginally outperforms Kinect in various experimental settings. Additionally, HMMs outperform Transformers by on average 17.0% on recognition accuracy. Given these results, we believe that a widely deployable game using only a 2D camera is feasible.Undergraduat

    DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

    Full text link
    There is an undeniable communication barrier between deaf people and people with normal hearing ability. Although innovations in sign language translation technology aim to tear down this communication barrier, the majority of existing sign language translation systems are either intrusive or constrained by resolution or ambient lighting conditions. Moreover, these existing systems can only perform single-sign ASL translation rather than sentence-level translation, making them much less useful in daily-life communication scenarios. In this work, we fill this critical gap by presenting DeepASL, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels. DeepASL uses infrared light as its sensing mechanism to non-intrusively capture the ASL signs. It incorporates a novel hierarchical bidirectional deep recurrent neural network (HB-RNN) and a probabilistic framework based on Connectionist Temporal Classification (CTC) for word-level and sentence-level ASL translation respectively. To evaluate its performance, we have collected 7,306 samples from 11 participants, covering 56 commonly used ASL words and 100 ASL sentences. DeepASL achieves an average 94.5% word-level translation accuracy and an average 8.2% word error rate on translating unseen ASL sentences. Given its promising performance, we believe DeepASL represents a significant step towards breaking the communication barrier between deaf people and hearing majority, and thus has the significant potential to fundamentally change deaf people's lives

    RGBD Datasets: Past, Present and Future

    Full text link
    Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
    corecore