6,702 research outputs found
ArabSign: A Multi-modality Dataset and Benchmark for Continuous Arabic Sign Language Recognition
Sign language recognition has attracted the interest of researchers in recent
years. While numerous approaches have been proposed for European and Asian sign
languages recognition, very limited attempts have been made to develop similar
systems for the Arabic sign language (ArSL). This can be attributed partly to
the lack of a dataset at the sentence level. In this paper, we aim to make a
significant contribution by proposing ArabSign, a continuous ArSL dataset. The
proposed dataset consists of 9,335 samples performed by 6 signers. The total
time of the recorded sentences is around 10 hours and the average sentence's
length is 3.1 signs. ArabSign dataset was recorded using a Kinect V2 camera
that provides three types of information (color, depth, and skeleton joint
points) recorded simultaneously for each sentence. In addition, we provide the
annotation of the dataset according to ArSL and Arabic language structures that
can help in studying the linguistic characteristics of ArSL. To benchmark this
dataset, we propose an encoder-decoder model for Continuous ArSL recognition.
The model has been evaluated on the proposed dataset, and the obtained results
show that the encoder-decoder model outperformed the attention mechanism with
an average word error rate (WER) of 0.50 compared with 0.62 with the attention
mechanism. The data and code are available at github.com/Hamzah-Luqman/ArabSignComment:
Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges
Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages. This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems
DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation
There is an undeniable communication barrier between deaf people and people
with normal hearing ability. Although innovations in sign language translation
technology aim to tear down this communication barrier, the majority of
existing sign language translation systems are either intrusive or constrained
by resolution or ambient lighting conditions. Moreover, these existing systems
can only perform single-sign ASL translation rather than sentence-level
translation, making them much less useful in daily-life communication
scenarios. In this work, we fill this critical gap by presenting DeepASL, a
transformative deep learning-based sign language translation technology that
enables ubiquitous and non-intrusive American Sign Language (ASL) translation
at both word and sentence levels. DeepASL uses infrared light as its sensing
mechanism to non-intrusively capture the ASL signs. It incorporates a novel
hierarchical bidirectional deep recurrent neural network (HB-RNN) and a
probabilistic framework based on Connectionist Temporal Classification (CTC)
for word-level and sentence-level ASL translation respectively. To evaluate its
performance, we have collected 7,306 samples from 11 participants, covering 56
commonly used ASL words and 100 ASL sentences. DeepASL achieves an average
94.5% word-level translation accuracy and an average 8.2% word error rate on
translating unseen ASL sentences. Given its promising performance, we believe
DeepASL represents a significant step towards breaking the communication
barrier between deaf people and hearing majority, and thus has the significant
potential to fundamentally change deaf people's lives
Recommended from our members
British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language
In this work, we show that a late fusion approach to multimodality in sign language recognition improves the overall ability of the model in comparison to the singular approaches of image classification (88.14%) and Leap Motion data classification (72.73%). With a large synchronous dataset of 18 BSL gestures collected from multiple subjects, two deep neural networks are benchmarked and compared to derive a best topology for each. The Vision model is implemented by a Convolutional Neural Network and optimised Artificial Neural Network, and the Leap Motion model is implemented by an evolutionary search of Artificial Neural Network topology. Next, the two best networks are fused for synchronised processing, which results in a better overall result (94.44%) as complementary features are learnt in addition to the original task. The hypothesis is further supported by application of the three models to a set of completely unseen data where a multimodality approach achieves the best results relative to the single sensor method. When transfer learning with the weights trained via British Sign Language, all three models outperform standard random weight distribution when classifying American Sign Language (ASL), and the best model overall for ASL classification was the transfer learning multimodality approach, which scored 82.55% accuracy
Automatic recognition of Arabic alphabets sign language using deep learning
Technological advancements are helping people with special needs overcome many communications’ obstacles. Deep learning and computer vision models are innovative leaps nowadays in facilitating unprecedented tasks in human interactions. The Arabic language is always a rich research area. In this paper, different deep learning models were applied to test the accuracy and efficiency obtained in automatic Arabic sign language recognition. In this paper, we provide a novel framework for the automatic detection of Arabic sign language, based on transfer learning applied on popular deep learning models for image processing. Specifically, by training AlexNet, VGGNet and GoogleNet/Inception models, along with testing the efficiency of shallow learning approaches based on support vector machine (SVM) and nearest neighbors algorithms as baselines. As a result, we propose a novel approach for the automatic recognition of Arabic alphabets in sign language based on VGGNet architecture which outperformed the other trained models. The proposed model is set to present promising results in recognizing Arabic sign language with an accuracy score of 97%. The suggested models are tested against a recent fully-labeled dataset of Arabic sign language images. The dataset contains 54,049 images, which is considered the first large and comprehensive real dataset of Arabic sign language to the furthest we know
Method and evidence: Gesture and iconicity in the evolution of language
The aim of this paper is to mount a challenge to gesture-first hypotheses about the evolution of language by identifying constraints on the emergence of symbol use. Current debates focus on a range of pre-conditions for the emergence of language, including co-operation and related mentalising capacities, imitation and tool use, episodic memory, and vocal physiology, but little specifically on the ability to learn and understand symbols. It is argued here that such a focus raises new questions about the plausibility of gesture-first hypotheses, and so about the evolution of language in general. After a brief review of the methodology used in the paper, it is argued that existing uses of gesture in hominid communities may have prohibited the emergence of symbol use, rather than ‘bootstrapped’ symbolic capacities as is usually assumed, and that the vocal channel offers other advantages in both learning and using language. In this case, the vocal channel offers a more promising platform for the evolution of language than is often assumed
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
An Avatar Based Natural Arabic Sign Language Generation System for Deaf People
Research demonstrates that individuals who are deaf are significantly aggrieved in the fields of education. A contributing factor to this difference is the difficulty deaf children have in acquiring learning concepts early in life. This paper will present an idea for highly interactive software using avatars (three-dimensional character modules) to process and translate free Arabic input to ARSL (ARabic Sign Language). A prototype for teaching maths and dictation for elementary schools will be discussed. This research could be valuable as a teaching tool in increasing: (1) the opportunity for deaf children to learn maths and dictation via interactive media; (2) the effectiveness of ARSL teachers. Keywords: Avatar, ARSL, Finger Spelling, Hand Shape
- …