493 research outputs found

    Visual recognition of American sign language using hidden Markov models

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 48-52).by Thad Eugene Starner.M.S

    Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?

    Full text link
    We introduce the problem of zero-shot sign language recognition (ZSSLR), where the goal is to leverage models learned over the seen sign class examples to recognize the instances of unseen signs. To this end, we propose to utilize the readily available descriptions in sign language dictionaries as an intermediate-level semantic representation for knowledge transfer. We introduce a new benchmark dataset called ASL-Text that consists of 250 sign language classes and their accompanying textual descriptions. Compared to the ZSL datasets in other domains (such as object recognition), our dataset consists of limited number of training examples for a large number of classes, which imposes a significant challenge. We propose a framework that operates over the body and hand regions by means of 3D-CNNs, and models longer temporal relationships via bidirectional LSTMs. By leveraging the descriptive text embeddings along with these spatio-temporal representations within a zero-shot learning framework, we show that textual data can indeed be useful in uncovering sign languages. We anticipate that the introduced approach and the accompanying dataset will provide a basis for further exploration of this new zero-shot learning problem.Comment: To appear in British Machine Vision Conference (BMVC) 201

    Joint Morphological and Syntactic Disambiguation

    Get PDF
    In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve

    Vision-based hand shape identification for sign language recognition

    Get PDF
    This thesis introduces an approach to obtain image-based hand features to accurately describe hand shapes commonly found in the American Sign Language. A hand recognition system capable of identifying 31 hand shapes from the American Sign Language was developed to identify hand shapes in a given input image or video sequence. An appearance-based approach with a single camera is used to recognize the hand shape. A region-based shape descriptor, the generic Fourier descriptor, invariant of translation, scale, and orientation, has been implemented to describe the shape of the hand. A wrist detection algorithm has been developed to remove the forearm from the hand region before the features are extracted. The recognition of the hand shapes is performed with a multi-class Support Vector Machine. Testing provided a recognition rate of approximately 84% based on widely varying testing set of approximately 1,500 images and training set of about 2,400 images. With a larger training set of approximately 2,700 images and a testing set of approximately 1,200 images, a recognition rate increased to about 88%

    A Survey of Applications and Human Motion Recognition with Microsoft Kinect

    Get PDF
    Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation

    Gesture Recognition Using Hidden Markov Models Augmented with Active Difference Signatures

    Get PDF
    With the recent invention of depth sensors, human gesture recognition has gained significant interest in the fields of computer vision and human computer interaction. Robust gesture recognition is a difficult problem because of the spatiotemporal variations in gesture formation, subject size, subject location, image fidelity, and subject occlusion. Gesture boundary detection, or the automatic detection of the onset and offset of a gesture in a sequence of gestures, is critical toward achieving robust gesture recognition. Existing gesture recognition methods perform the task of gesture segmentation either using resting frames in a gesture sequence or by using additional information such as audio, depth images, or RGB images. This ancillary information introduces high latency in gesture segmentation and recognition, thus making it inappropriate for real time applications. This thesis proposes a novel method to recognize time-varying human gestures from continuous video streams. The proposed method passes skeleton joint information into a Hidden Markov Model augmented with active difference signatures to achieve state-of-the-art gesture segmentation and recognition. Active body parts are used to calculate the likelihood of previously unseen data to facilitate gesture segmentation. Active difference signatures are used to describe temporal motion as well as static differences from a canonical resting position. Geometric features, such as joint angles, and joint topological distances are used along with active difference signatures as salient feature descriptors. These feature descriptors serve as unique signatures which identify hidden states in a Hidden Markov Model. The Hidden Markov Model is able to identify gestures in a robust fashion which is tolerant to spatiotemporal and human-to-human variation in gesture articulation. The proposed method is evaluated on both isolated and continuous datasets. An accuracy of 80.7% is achieved on the isolated MSR3D dataset and a mean Jaccard index of 0.58 is achieved on the continuous ChaLearn dataset. Results improve upon existing gesture recognition methods, which achieve a Jaccard index of 0.43 on the ChaLearn dataset. Comprehensive experiments investigate the feature selection, parameter optimization, and algorithmic methods to help understand the contributions of the proposed method

    Real-time American sign language recognition using desk and wearable computer based video

    Full text link
    corecore