2,951 research outputs found

    Video-based Sign Language Recognition without Temporal Segmentation

    Full text link
    Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.Comment: 32nd AAAI Conference on Artificial Intelligence (AAAI-18), Feb. 2-7, 2018, New Orleans, Louisiana, US

    Sign segmentation with changepoint-modulated pseudo-labelling

    Get PDF
    The objective of this work is to find temporal boundaries between signs in continuous sign language. Motivated by the paucity of annotation available for this task, we propose a simple yet effective algorithm to improve segmentation performance on unlabelled signing footage from a domain of interest. We make the following contributions: (1) We motivate and introduce the task of source-free domain adaptation for sign language segmentation, in which labelled source data is available for an initial training phase, but is not available during adaptation. (2) We propose the Changepoint-Modulated Pseudo-Labelling (CMPL) algorithm to leverage cues from abrupt changes in motion-sensitive feature space to improve pseudo-labelling quality for adaptation. (3) We showcase the effectiveness of our approach for category-agnostic sign segmentation, transferring from the BSLCORPUS to the BSL-1K and RWTH-PHOENIX-Weather 2014 datasets, where we outperform the prior state of the art

    Gloss Alignment Using Word Embeddings

    Full text link
    Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the \acf{mdgs} and \acf{bobsl} datasets, recovering up to a 33.22 BLEU-1 score in word alignment.Comment: 4 pages, 4 figures, 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW

    A Study on Techniques and Challenges in Sign Language Translation

    Get PDF
    Sign Language Translation (SLT) plays a pivotal role in enabling effective communication for the Deaf and Hard of Hearing (DHH) community. This review delves into the state-of-the-art techniques and methodologies in SLT, focusing on its significance, challenges, and recent advancements. The review provides a comprehensive analysis of various SLT approaches, ranging from rule-based systems to deep learning models, highlighting their strengths and limitations. Datasets specifically tailored for SLT research are explored, shedding light on the diversity and complexity of Sign Languages across the globe. The review also addresses critical issues in SLT, such as the expressiveness of generated signs, facial expressions, and non-manual signals. Furthermore, it discusses the integration of SLT into assistive technologies and educational tools, emphasizing the transformative potential in enhancing accessibility and inclusivity. Finally, the review outlines future directions, including the incorporation of multimodal inputs and the imperative need for co-creation with the Deaf community, paving the way for more accurate, expressive, and culturally sensitive Sign Language Generation systems

    FluentSigners-50: A signer independent benchmark dataset for sign language processing

    Get PDF
    This paper presents a new large-scale signer independent dataset for Kazakh-Russian Sign Language (KRSL) for the purposes of Sign Language Processing. We envision it to serve as a new benchmark dataset for performance evaluations of Continuous Sign Language Recognition (CSLR) and Translation (CSLT) tasks. The proposed FluentSigners-50 dataset consists of 173 sentences performed by 50 KRSL signers resulting in 43,250 video samples. Dataset contributors recorded videos in real-life settings on a wide variety of backgrounds using various devices such as smartphones and web cameras. Therefore, distance to the camera, camera angles and aspect ratio, video quality, and frame rates varied for each dataset contributor. Additionally, the proposed dataset contains a high degree of linguistic and inter-signer variability and thus is a better training set for recognizing a real-life sign language. FluentSigners-50 baseline is established using two state-of-the-art methods, Stochastic CSLR and TSPNet. To this end, we carefully prepared three benchmark train-test splits for models’ evaluations in terms of: signer independence, age independence, and unseen sentences. FluentSigners-50 is publicly available at https://krslproject.github.io/FluentSigners-50/publishedVersio

    Linguistically Motivated Sign Language Segmentation

    Full text link
    Sign language segmentation is a crucial task in sign language processing systems. It enables downstream tasks such as sign recognition, transcription, and machine translation. In this work, we consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases, larger units comprising several signs. We propose a novel approach to jointly model these two tasks. Our method is motivated by linguistic cues observed in sign language corpora. We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing. Given that prosody plays a significant role in phrase boundaries, we explore the use of optical flow features. We also provide an extensive analysis of hand shapes and 3D hand normalization. We find that introducing BIO tagging is necessary to model sign boundaries. Explicitly encoding prosody by optical flow improves segmentation in shallow models, but its contribution is negligible in deeper models. Careful tuning of the decoding algorithm atop the models further improves the segmentation quality. We demonstrate that our final models generalize to out-of-domain video content in a different signed language, even under a zero-shot setting. We observe that including optical flow and 3D hand normalization enhances the robustness of the model in this context.Comment: Accepted at EMNLP 2023 (Findings

    Visual identification by signature tracking

    Get PDF
    We propose a new camera-based biometric: visual signature identification. We discuss the importance of the parameterization of the signatures in order to achieve good classification results, independently of variations in the position of the camera with respect to the writing surface. We show that affine arc-length parameterization performs better than conventional time and Euclidean arc-length ones. We find that the system verification performance is better than 4 percent error on skilled forgeries and 1 percent error on random forgeries, and that its recognition performance is better than 1 percent error rate, comparable to the best camera-based biometrics
    • …
    corecore