Search CORE

3,844 research outputs found

The Impact of Quantity of Training Data on Recognition of Eating Gestures

Author: Hoover Adam
Muth Eric
Shen Yiru
Publication venue
Publication date: 11/12/2018
Field of study

This paper considers the problem of recognizing eating gestures by tracking wrist motion. Eating gestures can have large variability in motion depending on the subject, utensil, and type of food or beverage being consumed. Previous works have shown viable proofs-of-concept of recognizing eating gestures in laboratory settings with small numbers of subjects and food types, but it is unclear how well these methods would work if tested on a larger population in natural settings. As more subjects, locations and foods are tested, a larger amount of motion variability could cause a decrease in recognition accuracy. To explore this issue, this paper describes the collection and annotation of 51,614 eating gestures taken by 269 subjects eating a meal in a cafeteria. Experiments are described that explore the complexity of hidden Markov models (HMMs) and the amount of training data needed to adequately capture the motion variability across this large data set. Results found that HMMs needed a complexity of 13 states and 5 Gaussians to reach a plateau in accuracy, signifying that a minimum of 65 samples per gesture type are needed. Results also found that 500 training samples per gesture type were needed to identify the point of diminishing returns in recognition accuracy. Overall, the findings provide evidence that the size a data set typically used to demonstrate a laboratory proofs-of-concept may not be sufficiently large enough to capture all the motion variability that could be expected in transitioning to deployment with a larger population. Our data set, which is 1-2 orders of magnitude larger than all data sets tested in previous works, is being made publicly available

arXiv.org e-Print Archive

Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures

Author: Dogra Debi Prosad
Roy Partha Pratim
Singla Abhik
Publication venue: 'Elsevier BV'
Publication date: 22/10/2018
Field of study

Designing of touchless user interface is gaining popularity in various contexts. Using such interfaces, users can interact with electronic devices even when the hands are dirty or non-conductive. Also, user with partial physical disability can interact with electronic devices using such systems. Research in this direction has got major boost because of the emergence of low-cost sensors such as Leap Motion, Kinect or RealSense devices. In this paper, we propose a Leap Motion controller-based methodology to facilitate rendering of 2D and 3D shapes on display devices. The proposed method tracks finger movements while users perform natural gestures within the field of view of the sensor. In the next phase, trajectories are analyzed to extract extended Npen++ features in 3D. These features represent finger movements during the gestures and they are fed to unidirectional left-to-right Hidden Markov Model (HMM) for training. A one-to-one mapping between gestures and shapes is proposed. Finally, shapes corresponding to these gestures are rendered over the display using MuPad interface. We have created a dataset of 5400 samples recorded by 10 volunteers. Our dataset contains 18 geometric and 18 non-geometric shapes such as "circle", "rectangle", "flower", "cone", "sphere" etc. The proposed methodology achieves an accuracy of 92.87% when evaluated using 5-fold cross validation method. Our experiments revel that the extended 3D features perform better than existing 3D features in the context of shape representation and classification. The method can be used for developing useful HCI applications for smart display devices.Comment: Submitted to Elsevier Displays Journal, 32 pages, 18 figures, 7 table

arXiv.org e-Print Archive

A Probabilistic Modeling Approach to One-Shot Gesture Recognition

Author: Cox Marco
de Vries Bert
van Diepen Anouk
Publication venue
Publication date: 17/02/2020
Field of study

Gesture recognition enables a natural extension of the way we currently interact with devices. Commercially available gesture recognition systems are usually pre-trained and offer no option for customization by the user. In order to improve the user experience, it is desirable to allow end users to define their own gestures. This scenario requires learning from just a few training examples if we want to impose only a light training load on the user. To this end, we propose a gesture classifier based on a hierarchical probabilistic modeling approach. In this framework, high-level features that are shared among different gestures can be extracted from a large labeled data set, yielding a prior distribution for gestures. When learning new types of gestures, the learned shared prior reduces the number of required training examples for individual gestures. We implemented the proposed gesture classifier for a Myo sensor bracelet and show favorable results for the tested system on a database of 17 different gesture types. Furthermore, we propose and implement two methods to incorporate the gesture classifier in a real-time gesture recognition system

arXiv.org e-Print Archive

Driver distraction detection and recognition using RGB-D sensor

Author: Craye Céline
Karray Fakhri
Publication venue
Publication date: 01/02/2015
Field of study

Driver inattention assessment has become a very active field in intelligent transportation systems. Based on active sensor Kinect and computer vision tools, we have built an efficient module for detecting driver distraction and recognizing the type of distraction. Based on color and depth map data from the Kinect, our system is composed of four sub-modules. We call them eye behavior (detecting gaze and blinking), arm position (is the right arm up, down, right of forward), head orientation, and facial expressions. Each module produces relevant information for assessing driver inattention. They are merged together later on using two different classification strategies: AdaBoost classifier and Hidden Markov Model. Evaluation is done using a driving simulator and 8 drivers of different gender, age and nationality for a total of more than 8 hours of recording. Qualitative and quantitative results show strong and accurate detection and recognition capacity (85% accuracy for the type of distraction and 90% for distraction detection). Moreover, each module is obtained independently and could be used for other types of inference, such as fatigue detection, and could be implemented for real cars systems

arXiv.org e-Print Archive

Real-time on-device nod and shake recognition

Author: Brasher Reuben
Langholz Elmar H.
Publication venue
Publication date: 12/06/2018
Field of study

We discuss methods for teaching systems to identify gestures such as head nod and shake. We use iPhone X depth camera to gather data and later use similar data as input for a working app. These methods have proved robust for training with limited datasets and thus we make the argument that similar methods could be adapted to learn other human to human non-verbal gestures. We showcase how to augment Euler angle gesture sequences to train models with a relatively large number of parameters such as LSTM and GRU and gain better performance than reported for smaller models such as HMM. In the examples here, we demonstrate how to train such models with Keras and run the resulting models real time on device with CoreML

arXiv.org e-Print Archive

Understanding Human Motion and Gestures for Underwater Human-Robot Collaboration

Author: Islam Md Jahidul
Publication venue
Publication date: 06/04/2018
Field of study

In this paper, we present a number of robust methodologies for an underwater robot to visually detect, follow, and interact with a diver for collaborative task execution. We design and develop two autonomous diver-following algorithms, the first of which utilizes both spatial- and frequency-domain features pertaining to human swimming patterns in order to visually track a diver. The second algorithm uses a convolutional neural network-based model for robust tracking-by-detection. In addition, we propose a hand gesture-based human-robot communication framework that is syntactically simpler and computationally more efficient than the existing grammar-based frameworks. In the proposed interaction framework, deep visual detectors are used to provide accurate hand gesture recognition; subsequently, a finite-state machine performs robust and efficient gesture-to-instruction mapping. The distinguishing feature of this framework is that it can be easily adopted by divers for communicating with underwater robots without using artificial markers or requiring memorization of complex language rules. Furthermore, we validate the performance and effectiveness of the proposed methodologies through extensive field experiments in closed- and open-water environments. Finally, we perform a user interaction study to demonstrate the usability benefits of our proposed interaction framework compared to existing methods.Comment: arXiv admin note: text overlap with arXiv:1709.0877

arXiv.org e-Print Archive

MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language

Author: Joze Hamid Reza Vaezi
Koller Oscar
Publication venue
Publication date: 20/11/2019
Field of study

Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language due to the enormous cost of transcribing these unwritten languages. We propose the first real-life large-scale sign language data set comprising over 25,000 annotated videos, which we thoroughly evaluate with state-of-the-art methods from sign and related action recognition. Unlike the current state-of-the-art, the data set allows to investigate the generalization to unseen individuals (signer-independent test) in a realistic setting with over 200 signers. Previous work mostly deals with limited vocabulary tasks, while here, we cover a large class count of 1000 signs in challenging and unconstrained real-life recording conditions. We further propose I3D, known from video classifications, as a powerful and suitable architecture for sign language recognition, outperforming the current state-of-the-art by a large margin. The data set is publicly available to the community

arXiv.org e-Print Archive

SignsWorld; Deeping Into the Silence World and Hearing Its Signs (State of the Art)

Author: Asem A. S.
Elmonier Hamdy K.
Riad A. M.
Shohieb Samaa. M.
Publication venue
Publication date: 10/03/2012
Field of study

Automatic speech processing systems are employed more and more often in real environments. Although the underlying speech technology is mostly language independent, differences between languages with respect to their structure and grammar have substantial effect on the recognition systems performance. In this paper, we present a review of the latest developments in the sign language recognition research in general and in the Arabic sign language (ArSL) in specific. This paper also presents a general framework for improving the deaf community communication with the hearing people that is called SignsWorld. The overall goal of the SignsWorld project is to develop a vision-based technology for recognizing and translating continuous Arabic sign language ArSL.Comment: 20 pages, A state of art paper so it contains many reference

arXiv.org e-Print Archive

Implicit segmentation of Kannada characters in offline handwriting recognition using hidden Markov models

Author: Majjagi Vikas
Venkatesh Manasij
Vijayasenan Deepu
Publication venue
Publication date: 16/10/2014
Field of study

We describe a method for classification of handwritten Kannada characters using Hidden Markov Models (HMMs). Kannada script is agglutinative, where simple shapes are concatenated horizontally to form a character. This results in a large number of characters making the task of classification difficult. Character segmentation plays a significant role in reducing the number of classes. Explicit segmentation techniques suffer when overlapping shapes are present, which is common in the case of handwritten text. We use HMMs to take advantage of the agglutinative nature of Kannada script, which allows us to perform implicit segmentation of characters along with recognition. All the experiments are performed on the Chars74k dataset that consists of 657 handwritten characters collected across multiple users. Gradient-based features are extracted from individual characters and are used to train character HMMs. The use of implicit segmentation technique at the character level resulted in an improvement of around 10%. This system also outperformed an existing system tested on the same dataset by around 16%. Analysis based on learning curves showed that increasing the training data could result in better accuracy. Accordingly, we collected additional data and obtained an improvement of 4% with 6 additional samples

arXiv.org e-Print Archive

The speaker-independent lipreading play-off; a survey of lipreading machines

Author: Bear Helen L.
Burton Jake
Frank David
Navab Nassir
Saleh Madhi
Publication venue
Publication date: 24/10/2018
Field of study

Lipreading is a difficult gesture classification task. One problem in computer lipreading is speaker-independence. Speaker-independence means to achieve the same accuracy on test speakers not included in the training set as speakers within the training set. Current literature is limited on speaker-independent lipreading, the few independent test speaker accuracy scores are usually aggregated within dependent test speaker accuracies for an averaged performance. This leads to unclear independent results. Here we undertake a systematic survey of experiments with the TCD-TIMIT dataset using both conventional approaches and deep learning methods to provide a series of wholly speaker-independent benchmarks and show that the best speaker-independent machine scores 69.58% accuracy with CNN features and an SVM classifier. This is less than state of the art speaker-dependent lipreading machines, but greater than previously reported in independence experiments.Comment: To appear at the third IEEE International Conference on Image Processing, Applications and Systems 201

arXiv.org e-Print Archive