1,796 research outputs found

    Deep Learning Application On American Sign Language Database For Video-Based Gesture Recognition

    Get PDF
    ASL speaking individuals always bring a companion as a translator [1]. This creates barriers for those who wish to take part in activities alone. Online translators exist however, they are limited to the individual characters instead of the gestures which group characters in a meaningful way, and connectivity is not always accessible. Thus, this research tackles the limitations of existing technologies and presents a model, implemented in MATLAB 2020b, to be used for predicting and classifying American sign language gestures/characters. The proposed method looks further into current neural networks and how they can be utilized against our transformed World Largest { American Sign Language data set. Resourcing state of the art detection and segmentation algorithms, this paper analyzes the efficiency of pre-trained net-works against these various algorithms. Testing current machine learning strategies like Transfer Learning and their impact on training a model for recognition. Our research goals are 1. Manufacturing and augmenting our data set. 2. Apply transfer learning on our data sets to create various models. 3. Compare the various accuracies of each model. And finally, present a novel pattern for gesture recognition

    THE APPLICATION OF COMPUTER VISION, MACHINE AND DEEP LEARNING ALGORITHMS UTILIZING MATLAB

    Get PDF
    MATLAB is a multi-paradigm proprietary programming language and numerical computing environment developed by MathWorks. Within MATLAB Integrated Development Environment (IDE) you can perform Computer-aided design (CAD), different matrix manipulations, plotting of functions and data, implementation algorithms, creation of user interfaces, and has the ability to interface with programs written in other languages1. Since, its launch in 1984 MATLAB software has not particularly been associated within the field of data science. In 2013, that changed with the launch of their new data science concentrated toolboxes that included Deep Learning, Image Processing, Computer Vision, and then a year later Statistics and Machine Learning. The main objective of my thesis was to research and explore the field of data science. More specifically pertaining to the development of an object recognition application that could be built entirely using MATLAB IDE and have a positive social impact on the deaf community. And in doing so, answering the question, could MATLAB be utilized for development of this type of application? To simultaneously answer this question while addressing my main objectives, I constructed two different object recognition protocols utilizing MATLAB_R2019 with the add-on data science tool packages. I named the protocols ASLtranslate (I) and (II). This allowed me to experiment with all of MATLAB data science toolboxes while learning the differences, benefits, and disadvantages of using multiple approaches to the same problem. The methods and approaches for the design of both versions was very similar. ASLtranslate takes in 2D image of American Sign Language (ASL) hand gestures as an input, classifies the image and then outputs its corresponding alphabet character. ASLtranslate (I) was an implementation of image category classification using machine learning methods. ASLtranslate (II) was implemented by using a deep learning method called transfer learning, done by fine-tuning a pre-trained convolutional neural network (CNN), AlexNet, to perform classification on a new collection of images

    OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

    Full text link
    Generating captions for images is a task that has recently received considerable attention. In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. We propose OBJ2TEXT, a sequence-to-sequence model that encodes a set of objects and their locations as an input sequence using an LSTM network, and decodes this representation using an LSTM language model. We show that our model, despite encoding object layouts as a sequence, can represent spatial relationships between objects, and generate descriptions that are globally coherent and semantically relevant. We test our approach in a task of object-layout captioning by using only object annotations as inputs. We additionally show that our model, combined with a state-of-the-art object detector, improves an image captioning model from 0.863 to 0.950 (CIDEr score) in the test benchmark of the standard MS-COCO Captioning task.Comment: Accepted at EMNLP 201

    End-to-End Multiview Gesture Recognition for Autonomous Car Parking System

    Get PDF
    The use of hand gestures can be the most intuitive human-machine interaction medium. The early approaches for hand gesture recognition used device-based methods. These methods use mechanical or optical sensors attached to a glove or markers, which hinders the natural human-machine communication. On the other hand, vision-based methods are not restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine. Therefore, vision gesture recognition has been a popular area of research for the past thirty years. Hand gesture recognition finds its application in many areas, particularly the automotive industry where advanced automotive human-machine interface (HMI) designers are using gesture recognition to improve driver and vehicle safety. However, technology advances go beyond active/passive safety and into convenience and comfort. In this context, one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking. In this thesis, we leverage the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system. We propose a 3DCNN gesture model architecture that we train on a publicly available hand gesture database. We apply transfer learning methods to fine-tune the pre-trained gesture model on a custom-made data, which significantly improved the proposed system performance in real world environment. We adapt the architecture of the end-to-end solution to expand the state of the art video classifier from a single image as input (fed by monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we optimize the proposed solution to work on a limited resources embedded platform (Nvidia Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the accuracy robustness and real time functionality of the system

    Improving Syntactic Relationships Between Language and Objects

    Get PDF
    This paper presents the integration of natural language processing and computer vision to improve the syntax of the language generated when describing objects in images. The goal was to not only understand the objects in an image, but the interactions and activities occurring between the objects. We implemented a multi-modal neural network combining convolutional and recurrent neural network architectures to create a model that can maximize the likelihood of word combinations given a training image. The outcome was an image captioning model that leveraged transfer learning techniques for architecture components. Our novelty was to quantify the effectiveness of transfer learning schemes for encoders and decoders to qualify which were the best for improving syntactic relationships. Our work found the combination of ResNet feature extraction and fine-tuned BERT word embeddings to be the best performing architecture across two datasets - a valuable discovery for those continuing this work considering the cost of compute for these complex models

    Fine-Tuning Sign Language Translation Systems Through Deep Reinforcement Learning

    Get PDF
    Sign language is an important communication tool for a vast majority of deaf and hard-of-hearing (DHH) people. Data collected by the World Health organization states that there are 466 million people currently with hearing loss and that number could rise to 630 million by 2030 and over 930 million by 2050 \cite{DHH1}. Currently there are millions of sign language speakers around the world who utilize this skill on a daily basis. Bridging the gap between those who communicate solely with a spoken language and the DHH community is an ever-growing and omnipresent need. Unfortunately, in the field of natural language processing, sign language recognition and translation lags far behind its spoken language counterparts. The following research will seek to successfully leverage the field of Deep Reinforcement Learning (DRL) to make a significant improvement in the task of Sign Language Translation (SLT) with German Sign Language videos to German text sentences. To do this three major experiments are conducted. The first experiment examines the effects of Self-critical Sequence Training (SCST) when fine-tuning a simple Recurrent Neural Network (RNN) Long Short-Term Memory (LSTM) based sequence-to-sequence model. The second experiment takes the same SCST algorithm and applies it to a more powerful transformer based model. And the final experiment utilizes the Proximal Policy Optimization (PPO) algorithm alongside a novel fine-tuning process on the same transformer model. By using this approach of estimating the reward signal and normalization while optimizing for the model\u27s test-time greedy inference procedure we aim to establish a new or comparable SOTA result on the RWTH-PHOENIX-Weather-2014 T German sign language dataset
    • …
    corecore