3,423 research outputs found

    Automatic recognition of fingerspelled words in British Sign Language

    Get PDF
    We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding each other, and contains signs which are ambiguous from the observer’s viewpoint. The main contributions of our work include: (i) recognition based on hand shape alone, not requiring motion cues; (ii) robust visual features for hand shape recognition; (iii) scalability to large lexicon recognition with no re-training. We report results on a dataset of 1,000 low quality webcam videos of 100 words. The proposed method achieves a word recognition accuracy of 98.9%

    Continuous Action Recognition Based on Sequence Alignment

    Get PDF
    Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

    Video-based Sign Language Recognition without Temporal Segmentation

    Full text link
    Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.Comment: 32nd AAAI Conference on Artificial Intelligence (AAAI-18), Feb. 2-7, 2018, New Orleans, Louisiana, US

    New Method for Optimization of License Plate Recognition system with Use of Edge Detection and Connected Component

    Full text link
    License Plate recognition plays an important role on the traffic monitoring and parking management systems. In this paper, a fast and real time method has been proposed which has an appropriate application to find tilt and poor quality plates. In the proposed method, at the beginning, the image is converted into binary mode using adaptive threshold. Then, by using some edge detection and morphology operations, plate number location has been specified. Finally, if the plat has tilt, its tilt is removed away. This method has been tested on another paper data set that has different images of the background, considering distance, and angel of view so that the correct extraction rate of plate reached at 98.66%.Comment: 3rd IEEE International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi Universit Mashha
    • …
    corecore