7 research outputs found

    Automatic recognition of Arabic alphabets sign language using deep learning

    Get PDF
    Technological advancements are helping people with special needs overcome many communications’ obstacles. Deep learning and computer vision models are innovative leaps nowadays in facilitating unprecedented tasks in human interactions. The Arabic language is always a rich research area. In this paper, different deep learning models were applied to test the accuracy and efficiency obtained in automatic Arabic sign language recognition. In this paper, we provide a novel framework for the automatic detection of Arabic sign language, based on transfer learning applied on popular deep learning models for image processing. Specifically, by training AlexNet, VGGNet and GoogleNet/Inception models, along with testing the efficiency of shallow learning approaches based on support vector machine (SVM) and nearest neighbors algorithms as baselines. As a result, we propose a novel approach for the automatic recognition of Arabic alphabets in sign language based on VGGNet architecture which outperformed the other trained models. The proposed model is set to present promising results in recognizing Arabic sign language with an accuracy score of 97%. The suggested models are tested against a recent fully-labeled dataset of Arabic sign language images. The dataset contains 54,049 images, which is considered the first large and comprehensive real dataset of Arabic sign language to the furthest we know

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

    A multimodal human-robot sign language interaction framework applied in social robots

    Get PDF
    Deaf-mutes face many difficulties in daily interactions with hearing people through spoken language. Sign language is an important way of expression and communication for deaf-mutes. Therefore, breaking the communication barrier between the deaf-mute and hearing communities is significant for facilitating their integration into society. To help them integrate into social life better, we propose a multimodal Chinese sign language (CSL) gesture interaction framework based on social robots. The CSL gesture information including both static and dynamic gestures is captured from two different modal sensors. A wearable Myo armband and a Leap Motion sensor are used to collect human arm surface electromyography (sEMG) signals and hand 3D vectors, respectively. Two modalities of gesture datasets are preprocessed and fused to improve the recognition accuracy and to reduce the processing time cost of the network before sending it to the classifier. Since the input datasets of the proposed framework are temporal sequence gestures, the long-short term memory recurrent neural network is used to classify these input sequences. Comparative experiments are performed on an NAO robot to test our method. Moreover, our method can effectively improve CSL gesture recognition accuracy, which has potential applications in a variety of gesture interaction scenarios not only in social robots

    Analysis of Sign Language Facial Expressions and Deaf Students\u27 Retention Using Machine Learning and Agent-based Modeling

    Get PDF
    There are currently about 466 million people worldwide who have a hearing disability, and that number is expected to increase to 900 million by 2050. About 15% of adult Americans have hearing disabilities and about every three in 1,000 U.S. children are born with hearing loss in one or both ears. The World Health Organization (WHO) estimates that unaddressed hearing loss poses an annual global cost of $980 billion, including cost of educational support, loss of productivity, and societal costs. These are all evident that people with hearing loss are experiencing several kinds and levels of difficulties. In this dissertation, we are addressing two main challenges of hearing impaired people; sign language recognition and post-secondary education. Both sign language recognition and reliable education systems that properly support the deaf community are essential needs of the globe and in this dissertation we aim to attack these exact problems. For the first part, we introduce novel dataset and methodology using machine learning while for the second part, a novel agent-based model framework is proposed. Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this dissertation, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of sad , surprise , fear , angry , neutral , disgust , and happy . We also considered the None class if the image\u27s facial expression could not be described by any of the emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems. In addition, post-secondary education persistence is the likelihood of a student remaining in post-secondary education. Although statistics show that post-secondary persistence for deaf students has increased recently, there are still many obstacles obstructing students from completing their post-secondary degree goals. Therefore, increasing the persistence rate is crucial to increase education and work goals for deaf students. In this work, we present an agent-based model using NetLogo software for the persistence phenomena of deaf students. We consider four non-cognitive factors: having clear goals, social integration, social skills, and academic experience, which influence the departure decision of deaf students. Progress and results of this work suggest that agent-based modeling approaches promise to give better understanding of what will increase persistence

    Review of three-dimensional human-computer interaction with focus on the leap motion controller

    Get PDF
    Modern hardware and software development has led to an evolution of user interfaces from command-line to natural user interfaces for virtual immersive environments. Gestures imitating real-world interaction tasks increasingly replace classical two-dimensional interfaces based on Windows/Icons/Menus/Pointers (WIMP) or touch metaphors. Thus, the purpose of this paper is to survey the state-of-the-art Human-Computer Interaction (HCI) techniques with a focus on the special field of three-dimensional interaction. This includes an overview of currently available interaction devices, their applications of usage and underlying methods for gesture design and recognition. Focus is on interfaces based on the Leap Motion Controller (LMC) and corresponding methods of gesture design and recognition. Further, a review of evaluation methods for the proposed natural user interfaces is given

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    A multiple optical tracking based approach for enhancing hand-based interaction in virtual reality simulations

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Research exploring natural virtual reality interaction has seen significant success in optical tracker-based approaches, enabling users to freely interact using their hands. Optical based trackers can provide users with real-time, high-fidelity virtual hand representations for natural interaction and an immersive experience. However, work in this area has identified four issues: occlusion, field-of-view, stability and accuracy. To overcome the four key issues, researchers have investigated approaches such as using multiple sensors. Research has shown multi-sensor-based approaches to be effective in improving recognition accuracy. However, such approaches typically use statically positioned sensors, which introduce body occlusion issues that make tracking hands challenging. Machine learning approaches have also been explored to improve gesture recognition. However, such approaches typically require a pre-set gesture vocabulary limiting user actions with larger vocabularies hindering real-time performance. This thesis presents an optical hand-based interaction system that comprises two Leap Motion sensors mounted onto a VR headset at different orientations. Novel approaches to the aggregation and validation of sensor data are presented. A machine learning sub-system is developed to validate hand data received by the sensors. Occlusion detection, stability detection, inferred hands and a hand interpolation sub-system are also developed to ensure that valid hand representations are always shown to the user. In addition, a mesh conformation sub-system ensures 3D objects are appropriately held in a user’s virtual hand. The presented system addresses the four key issues of optical sessions to provide a smooth and consistent user experience. The MOT system is evaluated against traditional interaction approaches; gloves, motion controllers and a single front-facing sensor configuration. The comparative sensor evaluation analysed the validity and availability of tracking data, along with each sensors effect on the MOT system. The results show the MOT provides a more stable experience than the front-facing configuration and produces significantly more valid tracking data. The results also demonstrated the effectiveness of a 45-degree sensor configuration in comparison to a front-facing. Furthermore, the results demonstrated the effectiveness of the MOT systems solutions at handling the four key issues with optical trackers
    corecore