1,993 research outputs found

    Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine

    Get PDF
    In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey's Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    Deep Learning Approach For Sign Language Recognition

    Get PDF
    Sign language is a method of communication that uses hand movements between fellow people with hearing loss. Problems occur when communication between normal people with hearing disorders, because not everyone understands sign language, so the model is needed for sign language recognition. This study aims to make the model of the introduction of hand sign language using a deep learning approach. The model used is Convolutional Neural Network (CNN). This model is tested using the ASL alphabet database consisting of 27 categories, where each category consists of 3000 images or a total of 87,000 images of 200 x 200 pixels of hand signals. First is the process of resizing the image input to 32 x 32 pixels. Furthermore, separating the dataset for training and validation respectively 75% and 25%. The test results indicate that the proposed model has good performance with a value of 99% accuracy. Experiment results show that preprocessing images using background correction can improve model performance

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Analysis of Sign Language Facial Expressions and Deaf Students\u27 Retention Using Machine Learning and Agent-based Modeling

    Get PDF
    There are currently about 466 million people worldwide who have a hearing disability, and that number is expected to increase to 900 million by 2050. About 15% of adult Americans have hearing disabilities and about every three in 1,000 U.S. children are born with hearing loss in one or both ears. The World Health Organization (WHO) estimates that unaddressed hearing loss poses an annual global cost of $980 billion, including cost of educational support, loss of productivity, and societal costs. These are all evident that people with hearing loss are experiencing several kinds and levels of difficulties. In this dissertation, we are addressing two main challenges of hearing impaired people; sign language recognition and post-secondary education. Both sign language recognition and reliable education systems that properly support the deaf community are essential needs of the globe and in this dissertation we aim to attack these exact problems. For the first part, we introduce novel dataset and methodology using machine learning while for the second part, a novel agent-based model framework is proposed. Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this dissertation, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of sad , surprise , fear , angry , neutral , disgust , and happy . We also considered the None class if the image\u27s facial expression could not be described by any of the emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems. In addition, post-secondary education persistence is the likelihood of a student remaining in post-secondary education. Although statistics show that post-secondary persistence for deaf students has increased recently, there are still many obstacles obstructing students from completing their post-secondary degree goals. Therefore, increasing the persistence rate is crucial to increase education and work goals for deaf students. In this work, we present an agent-based model using NetLogo software for the persistence phenomena of deaf students. We consider four non-cognitive factors: having clear goals, social integration, social skills, and academic experience, which influence the departure decision of deaf students. Progress and results of this work suggest that agent-based modeling approaches promise to give better understanding of what will increase persistence

    Unsupervised Generative Modeling Using Matrix Product States

    Full text link
    Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning analogous to the density matrix renormalization group method, which allows dynamically adjusting dimensions of the tensors and offers an efficient direct sampling approach for generative tasks. We apply our method to generative modeling of several standard datasets including the Bars and Stripes, random binary patterns and the MNIST handwritten digits to illustrate the abilities, features and drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work sheds light on many interesting directions of future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which are promisingly possible to be realized on quantum devices.Comment: 11 pages, 12 figures (not including the TNs) GitHub Page: https://congzlwag.github.io/UnsupGenModbyMPS

    Gesture and sign language recognition with deep learning

    Get PDF
    • …
    corecore