260 research outputs found

    Efficient Kinect Sensor-based Kurdish Sign Language Recognition Using Echo System Network

    Get PDF
    Sign language assists in building communication and bridging gaps in understanding. Automatic sign language recognition (ASLR) is a field that has recently been studied for various sign languages. However, Kurdish sign language (KuSL) is relatively new and therefore researches and designed datasets on it are limited. This paper has proposed a model to translate KuSL into text and has designed a dataset using Kinect V2 sensor. The computation complexity of feature extraction and classification steps, which are serious problems for ASLR, has been investigated in this paper. The paper proposed a feature engineering approach on the skeleton position alone to provide a better representation of the features and avoid the use of all of the image information. In addition, the paper proposed model makes use of recurrent neural networks (RNNs)-based models. Training RNNs is inherently difficult, and consequently, motivates to investigate alternatives. Besides the trainable long short-term memory (LSTM), this study has proposed the untrained low complexity echo system network (ESN) classifier. The accuracy of both LSTM and ESN indicates they can outperform those in state-of-the-art studies. In addition, ESN which has not been proposed thus far for ASLT exhibits comparable accuracy to the LSTM with a significantly lower training time

    AUTOMATIC DETECTION AND RECOGNITION OF 3D MANUAL GESTURES FOR HUMAN-MACHINE INTERACTION

    Get PDF
    In this paper, we propose an approach to detect and recognize 3D one-handed gestures for human-machine interaction. The logical structure of the modules of the system for recording a gestural database is described. The logical structure of the database of 3D gestures is presented. Examples of frames showing gestures in the format of Full High Definition, in the map depth mode and in the infrared illustrated. Models of a deep convolutional network for detecting faces and hand shapes are described. The results of automatic detection of the area with the face and the shape of the hand are given. Identified the distinctive features of the gesture at a certain point in time. The process of recognizing 3D one-handed gestures is described. Due to its versatility, this method can be used in tasks of biometrics, computer vision, machine learning, automatic systems of face recognition, sign languages

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations

    Data mining and modelling for sign language

    Get PDF
    Sign languages have received significantly less attention than spoken languages in the research areas of corpus analysis, machine translation, recognition, synthesis and social signal processing, amongst others. This is mainly due to signers being in a clear minority and there being a strong prior belief that sign languages are simply arbitrary gestures. To date, this manifests in the insufficiency of sign language resources available for computational modelling and analysis, with no agreed standards and relatively stagnated advancements compared to spoken language interaction research. Fortunately, the machine learning community has developed methods, such as transfer learning, for dealing with sparse resources, while data mining techniques, such as clustering can provide insights into the data. The work described here utilises such transfer learning techniques to apply neural language model to signed utterances and to compare sign language phonemes, which allows for clustering of similar signs, leading to automated annotation of sign language resources. This thesis promotes the idea that sign language research in computing should rely less on hand-annotated data thus opening up the prospect of using readily available online data (e.g. signed song videos) through the computational modelling and automated annotation techniques presented in this thesis

    Hand gesture recognition through capacitive sensing : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Electronics & Computer Engineering at Massey University, School of Food and Advanced Technology (SF&AT), Auckland, New Zealand

    Get PDF
    Figures 1.1, 1.2, 1.3, 2.1, 2.3 & 2.4 are re-used with permission. Figure 2.2 (=Smith, 1996 Fig 1) ©1996 by International Business Machines Corporation was removed.This thesis investigated capacitive sensing-based hand gesture recognition by developing and validating through custom built hardware. We attempted to discover if massed arrays of capacitance sensors can produce a robust system capable of simple hand gesture detection and recognition. The first stage of this research was to build the hardware that performed capacitance sensing. This hardware needs to be sensitive enough to capture minor variations in capacitance values, while also reducing stray capacitance to their minimum. The hardware designed in this stage formed the basis of all the data captured and utilised for subsequent training and testing of machine learning based classifiers. The second stage of this system used mass arrays of capacitance sensor pads to capture frames of hand gestures in the form of low-resolution 2D images. The raw data was then processed to account for random variations and noise present naturally in the surrounding environment. Five different gestures were captured from several test participants and used to train, validate and test the classifiers. Different methods were explored in the recognition and classification stage: initially, simple probabilistic classifiers were used; afterwards, neural networks were used. Two types of neural networks are explored, namely Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN), which are capable of achieving upwards of 92.34 % classification accuracy

    A Machine Learning Based Full Duplex System Supporting Multiple Sign Languages for the Deaf and Mute

    Get PDF
    This manuscript presents a full duplex communication system for the Deaf and Mute (D-M) based on Machine Learning (ML). These individuals, who generally communicate through sign language, are an integral part of our society, and their contribution is vital. They face communication difficulties mainly because others, who generally do not know sign language, are unable to communicate with them. The work presents a solution to this problem through a system enabling the non-deaf and mute (ND-M) to communicate with the D-M individuals without the need to learn sign language. The system is low-cost, reliable, easy to use, and based on a commercial-off-the-shelf (COTS) Leap Motion Device (LMD). The hand gesture data of D-M individuals is acquired using an LMD device and processed using a Convolutional Neural Network (CNN) algorithm. A supervised ML algorithm completes the processing and converts the hand gesture data into speech. A new dataset for the ML-based algorithm is created and presented in this manuscript. This dataset includes three sign language datasets, i.e., American Sign Language (ASL), Pakistani Sign Language (PSL), and Spanish Sign Language (SSL). The proposed system automatically detects the sign language and converts it into an audio message for the ND-M. Similarities between the three sign languages are also explored, and further research can be carried out in order to help create more datasets, which can be a combination of multiple sign languages. The ND-M can communicate by recording their speech, which is then converted into text and hand gesture images. The system can be upgraded in the future to support more sign language datasets. The system also provides a training mode that can help D-M individuals improve their hand gestures and also understand how accurately the system is detecting these gestures. The proposed system has been validated through a series of experiments resulting in hand gesture detection accuracy exceeding 95%Funding for open access charge: Universidad de Málag

    Viseme-based Lip-Reading using Deep Learning

    Get PDF
    Research in Automated Lip Reading is an incredibly rich discipline with so many facets that have been the subject of investigation including audio-visual data, feature extraction, classification networks and classification schemas. The most advanced and up-to-date lip-reading systems can predict entire sentences with thousands of different words and the majority of them use ASCII characters as the classification schema. The classification performance of such systems however has been insufficient and the need to cover an ever expanding range of vocabulary using as few classes as possible is challenge. The work in this thesis contributes to the area concerning classification schemas by proposing an automated lip reading model that predicts sentences using visemes as a classification schema. This is an alternative schema to using ASCII characters, which is the conventional class system used to predict sentences. This thesis provides a review of the current trends in deep learning- based automated lip reading and analyses a gap in the research endeavours of automated lip-reading by contributing towards work done in the region of classification schema. A whole new line of research is opened up whereby an alternative way to do lip-reading is explored and in doing so, lip-reading performance results for predicting s entences from a benchmark dataset are attained which improve upon the current state-of-the-art. In this thesis, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The lip-reading system predicts sentences as a two-stage procedure with visemes being recognised as the first stage and words being classified as the second stage. This is such that the second-stage has to both overcome the one-to-many mapping problem posed in lip-reading where one set of visemes can map to several words, and the problem of visemes being confused or misclassified to begin with. To develop the proposed lip-reading system, a number of tasks have been performed in this thesis. These include the classification of continuous sequences of visemes; and the proposal of viseme-to-word conversion models that are both effective in their conversion performance of predicting words, and robust to the possibility of viseme confusion or misclassification. The initial system reported has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset attaining a word accuracy rate of 64.6%. Compared with the state-of-the-art works in lip reading sentences reported at the time, the system had achieved a significantly improved performance. The lip reading system is further improved upon by using a language model that has been demonstrated to be effective at discriminating between homopheme words and being robust to incorrectly classified visemes. An improved performance in predicting spoken sentences from the LRS2 dataset is yielded with an attained word accuracy rate of 79.6% which is still better than another lip-reading system trained and evaluated on the the same dataset that attained a word accuracy rate 77.4% and it is to the best of our knowledge the next best observed result attained on LRS2
    corecore