1,796 research outputs found
Deep Learning Application On American Sign Language Database For Video-Based Gesture Recognition
ASL speaking individuals always bring a companion as a translator [1]. This creates barriers for those who wish to take part in activities alone. Online translators exist however, they are limited to the individual characters instead of the gestures which group characters in a meaningful way, and connectivity is not always accessible. Thus, this research tackles the limitations of existing technologies and presents a model, implemented in MATLAB 2020b, to be used for predicting and classifying American sign language gestures/characters. The proposed method looks further into current neural networks and how they can be utilized against our transformed World Largest { American Sign Language data set. Resourcing state of the art detection and segmentation algorithms, this paper analyzes the efficiency of pre-trained net-works against these various algorithms. Testing current machine learning strategies like Transfer Learning and their impact on training a model for recognition. Our research goals are 1. Manufacturing and augmenting our data set. 2. Apply transfer learning on our data sets to create various models. 3. Compare the various accuracies of each model. And finally, present a novel pattern for gesture recognition
THE APPLICATION OF COMPUTER VISION, MACHINE AND DEEP LEARNING ALGORITHMS UTILIZING MATLAB
MATLAB is a multi-paradigm proprietary programming language and numerical computing environment developed by MathWorks. Within MATLAB Integrated Development Environment (IDE) you can perform Computer-aided design (CAD), different matrix manipulations, plotting of functions and data, implementation algorithms, creation of user interfaces, and has the ability to interface with programs written in other languages1. Since, its launch in 1984 MATLAB software has not particularly been associated within the field of data science. In 2013, that changed with the launch of their new data science concentrated toolboxes that included Deep Learning, Image Processing, Computer Vision, and then a year later Statistics and Machine Learning.
The main objective of my thesis was to research and explore the field of data science. More specifically pertaining to the development of an object recognition application that could be built entirely using MATLAB IDE and have a positive social impact on the deaf community. And in doing so, answering the question, could MATLAB be utilized for development of this type of application? To simultaneously answer this question while addressing my main objectives, I constructed two different object recognition protocols utilizing MATLAB_R2019 with the add-on data science tool packages. I named the protocols ASLtranslate (I) and (II). This allowed me to experiment with all of MATLAB data science toolboxes while learning the differences, benefits, and disadvantages of using multiple approaches to the same problem.
The methods and approaches for the design of both versions was very similar. ASLtranslate takes in 2D image of American Sign Language (ASL) hand gestures as an input, classifies the image and then outputs its corresponding alphabet character. ASLtranslate (I) was an implementation of image category classification using machine learning methods. ASLtranslate (II) was implemented by using a deep learning method called transfer learning, done by fine-tuning a pre-trained convolutional neural network (CNN), AlexNet, to perform classification on a new collection of images
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Generating captions for images is a task that has recently received
considerable attention. In this work we focus on caption generation for
abstract scenes, or object layouts where the only information provided is a set
of objects and their locations. We propose OBJ2TEXT, a sequence-to-sequence
model that encodes a set of objects and their locations as an input sequence
using an LSTM network, and decodes this representation using an LSTM language
model. We show that our model, despite encoding object layouts as a sequence,
can represent spatial relationships between objects, and generate descriptions
that are globally coherent and semantically relevant. We test our approach in a
task of object-layout captioning by using only object annotations as inputs. We
additionally show that our model, combined with a state-of-the-art object
detector, improves an image captioning model from 0.863 to 0.950 (CIDEr score)
in the test benchmark of the standard MS-COCO Captioning task.Comment: Accepted at EMNLP 201
End-to-End Multiview Gesture Recognition for Autonomous Car Parking System
The use of hand gestures can be the most intuitive human-machine interaction medium.
The early approaches for hand gesture recognition used device-based methods. These
methods use mechanical or optical sensors attached to a glove or markers, which hinders
the natural human-machine communication. On the other hand, vision-based methods are
not restrictive and allow for a more spontaneous communication without the need of an
intermediary between human and machine. Therefore, vision gesture recognition has been
a popular area of research for the past thirty years.
Hand gesture recognition finds its application in many areas, particularly the automotive
industry where advanced automotive human-machine interface (HMI) designers are
using gesture recognition to improve driver and vehicle safety. However, technology advances
go beyond active/passive safety and into convenience and comfort. In this context,
one of America’s big three automakers has partnered with the Centre of Pattern Analysis
and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding
their product segment through machine learning to provide an increased driver convenience
and comfort with the particular application of hand gesture recognition for autonomous
car parking.
In this thesis, we leverage the state-of-the-art deep learning and optimization techniques
to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system.
We propose a 3DCNN gesture model architecture that we train on a publicly available
hand gesture database. We apply transfer learning methods to fine-tune the pre-trained
gesture model on a custom-made data, which significantly improved the proposed system
performance in real world environment. We adapt the architecture of the end-to-end solution
to expand the state of the art video classifier from a single image as input (fed by
monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we
optimize the proposed solution to work on a limited resources embedded platform (Nvidia
Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the
accuracy robustness and real time functionality of the system
Improving Syntactic Relationships Between Language and Objects
This paper presents the integration of natural language processing and computer vision to improve the syntax of the language generated when describing objects in images. The goal was to not only understand the objects in an image, but the interactions and activities occurring between the objects. We implemented a multi-modal neural network combining convolutional and recurrent neural network architectures to create a model that can maximize the likelihood of word combinations given a training image. The outcome was an image captioning model that leveraged transfer learning techniques for architecture components. Our novelty was to quantify the effectiveness of transfer learning schemes for encoders and decoders to qualify which were the best for improving syntactic relationships. Our work found the combination of ResNet feature extraction and fine-tuned BERT word embeddings to be the best performing architecture across two datasets - a valuable discovery for those continuing this work considering the cost of compute for these complex models
Fine-Tuning Sign Language Translation Systems Through Deep Reinforcement Learning
Sign language is an important communication tool for a vast majority of deaf and hard-of-hearing (DHH) people. Data collected by the World Health organization states that there are 466 million people currently with hearing loss and that number could rise to 630 million by 2030 and over 930 million by 2050 \cite{DHH1}. Currently there are millions of sign language speakers around the world who utilize this skill on a daily basis. Bridging the gap between those who communicate solely with a spoken language and the DHH community is an ever-growing and omnipresent need. Unfortunately, in the field of natural language processing, sign language recognition and translation lags far behind its spoken language counterparts. The following research will seek to successfully leverage the field of Deep Reinforcement Learning (DRL) to make a significant improvement in the task of Sign Language Translation (SLT) with German Sign Language videos to German text sentences. To do this three major experiments are conducted. The first experiment examines the effects of Self-critical Sequence Training (SCST) when fine-tuning a simple Recurrent Neural Network (RNN) Long Short-Term Memory (LSTM) based sequence-to-sequence model. The second experiment takes the same SCST algorithm and applies it to a more powerful transformer based model. And the final experiment utilizes the Proximal Policy Optimization (PPO) algorithm alongside a novel fine-tuning process on the same transformer model. By using this approach of estimating the reward signal and normalization while optimizing for the model\u27s test-time greedy inference procedure we aim to establish a new or comparable SOTA result on the RWTH-PHOENIX-Weather-2014 T German sign language dataset
- …