Search CORE

6,702 research outputs found

ArabSign: A Multi-modality Dataset and Benchmark for Continuous Arabic Sign Language Recognition

Author: Luqman Hamzah
Publication venue
Publication date: 08/10/2022
Field of study

Sign language recognition has attracted the interest of researchers in recent years. While numerous approaches have been proposed for European and Asian sign languages recognition, very limited attempts have been made to develop similar systems for the Arabic sign language (ArSL). This can be attributed partly to the lack of a dataset at the sentence level. In this paper, we aim to make a significant contribution by proposing ArabSign, a continuous ArSL dataset. The proposed dataset consists of 9,335 samples performed by 6 signers. The total time of the recorded sentences is around 10 hours and the average sentence's length is 3.1 signs. ArabSign dataset was recorded using a Kinect V2 camera that provides three types of information (color, depth, and skeleton joint points) recorded simultaneously for each sentence. In addition, we provide the annotation of the dataset according to ArSL and Arabic language structures that can help in studying the linguistic characteristics of ArSL. To benchmark this dataset, we propose an encoder-decoder model for Continuous ArSL recognition. The model has been evaluated on the proposed dataset, and the obtained results show that the encoder-decoder model outperformed the attention mechanism with an average word error rate (WER) of 0.50 compared with 0.62 with the attention mechanism. The data and code are available at github.com/Hamzah-Luqman/ArabSignComment:

arXiv.org e-Print Archive

Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

Author: Bhaskar Shabina
M Thasleema T
Publication venue: IAES Indonesia Section
Publication date: 31/05/2022
Field of study

Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages. This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

Author: Chai Xiujuan
Du Yong
Graves Alex
Kanwal Kehkashan
Praveen Nikhita
Socher Richard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2018
Field of study

There is an undeniable communication barrier between deaf people and people with normal hearing ability. Although innovations in sign language translation technology aim to tear down this communication barrier, the majority of existing sign language translation systems are either intrusive or constrained by resolution or ambient lighting conditions. Moreover, these existing systems can only perform single-sign ASL translation rather than sentence-level translation, making them much less useful in daily-life communication scenarios. In this work, we fill this critical gap by presenting DeepASL, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels. DeepASL uses infrared light as its sensing mechanism to non-intrusively capture the ASL signs. It incorporates a novel hierarchical bidirectional deep recurrent neural network (HB-RNN) and a probabilistic framework based on Connectionist Temporal Classification (CTC) for word-level and sentence-level ASL translation respectively. To evaluate its performance, we have collected 7,306 samples from 11 participants, covering 56 commonly used ASL words and 100 ASL sentences. DeepASL achieves an average 94.5% word-level translation accuracy and an average 8.2% word error rate on translating unseen ASL sentences. Given its promising performance, we believe DeepASL represents a significant step towards breaking the communication barrier between deaf people and hearing majority, and thus has the significant potential to fundamentally change deaf people's lives

arXiv.org e-Print Archive

Crossref

Recommended from our members

British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language

Author: Bird Jordan J.
Ekárt Anikó
Faria Diego R.
Publication venue: 'MDPI AG'
Publication date: 09/09/2020
Field of study

In this work, we show that a late fusion approach to multimodality in sign language recognition improves the overall ability of the model in comparison to the singular approaches of image classification (88.14%) and Leap Motion data classification (72.73%). With a large synchronous dataset of 18 BSL gestures collected from multiple subjects, two deep neural networks are benchmarked and compared to derive a best topology for each. The Vision model is implemented by a Convolutional Neural Network and optimised Artificial Neural Network, and the Leap Motion model is implemented by an evolutionary search of Artificial Neural Network topology. Next, the two best networks are fused for synchronised processing, which results in a better overall result (94.44%) as complementary features are learnt in addition to the original task. The hypothesis is further supported by application of the three models to a set of completely unseen data where a multimodality approach achieves the best results relative to the single sensor method. When transfer learning with the weights trained via British Sign Language, all three models outperform standard random weight distribution when classifying American Sign Language (ASL), and the best model overall for ASL classification was the transfer learning multimodality approach, which scored 82.55% accuracy

Nottingham Trent Institutional Repository (IRep)

Aston Publications Explorer

Automatic recognition of Arabic alphabets sign language using deep learning

Author: Duwairi Rehab Mustafa
Halloush Zain Abdullah
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2022
Field of study

Technological advancements are helping people with special needs overcome many communications’ obstacles. Deep learning and computer vision models are innovative leaps nowadays in facilitating unprecedented tasks in human interactions. The Arabic language is always a rich research area. In this paper, different deep learning models were applied to test the accuracy and efficiency obtained in automatic Arabic sign language recognition. In this paper, we provide a novel framework for the automatic detection of Arabic sign language, based on transfer learning applied on popular deep learning models for image processing. Specifically, by training AlexNet, VGGNet and GoogleNet/Inception models, along with testing the efficiency of shallow learning approaches based on support vector machine (SVM) and nearest neighbors algorithms as baselines. As a result, we propose a novel approach for the automatic recognition of Arabic alphabets in sign language based on VGGNet architecture which outperformed the other trained models. The proposed model is set to present promising results in recognizing Arabic sign language with an accuracy score of 97%. The suggested models are tested against a recent fully-labeled dataset of Arabic sign language images. The dataset contains 54,049 images, which is considered the first large and comprehensive real dataset of Arabic sign language to the furthest we know

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Method and evidence: Gesture and iconicity in the evolution of language

Author: Irvine Elizabeth
Publication venue: 'Wiley'
Publication date: 30/04/2016
Field of study

The aim of this paper is to mount a challenge to gesture-first hypotheses about the evolution of language by identifying constraints on the emergence of symbol use. Current debates focus on a range of pre-conditions for the emergence of language, including co-operation and related mentalising capacities, imitation and tool use, episodic memory, and vocal physiology, but little specifically on the ability to learn and understand symbols. It is argued here that such a focus raises new questions about the plausibility of gesture-first hypotheses, and so about the evolution of language in general. After a brief review of the methodology used in the paper, it is argued that existing uses of gesture in hominid communities may have prohibited the emergence of symbol use, rather than ‘bootstrapped’ symbolic capacities as is usually assumed, and that the vocal channel offers other advantages in both learning and using language. In this case, the vocal channel offers a more promising platform for the evolution of language than is often assumed

Online Research @ Cardiff

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Author: Barman Raphaël
Clematide Simon
Ehrmann Maud
Kaplan Frédéric
Oliveira Sofia Ares
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 14/12/2020
Field of study

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

An Avatar Based Natural Arabic Sign Language Generation System for Deaf People

Author: Albidewi Ibrahim
Ghanem Sakher
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/10/2013
Field of study

Research demonstrates that individuals who are deaf are significantly aggrieved in the fields of education. A contributing factor to this difference is the difficulty deaf children have in acquiring learning concepts early in life. This paper will present an idea for highly interactive software using avatars (three-dimensional character modules) to process and translate free Arabic input to ARSL (ARabic Sign Language). A prototype for teaching maths and dictation for elementary schools will be discussed. This research could be valuable as a teaching tool in increasing: (1) the opportunity for deaf children to learn maths and dictation via interactive media; (2) the effectiveness of ARSL teachers. Keywords: Avatar, ARSL, Finger Spelling, Hand Shape

International Institute for Science, Technology and Education (IISTE): E-Journals