92,019 research outputs found

    Gesture recognition with application to human-robot interaction

    Get PDF
    Gestures are a natural form of communication, often transcending language barriers. Recently, much research has been focused on achieving natural human-machine interaction using gestures. This dissertation presents the design of a gestural interface that can be used to control a robot. The system consists of two modes: far-mode and near-mode. In far-mode interaction, upper-body gestures are used to control the motion of a robot. Near-mode interaction uses static hand poses to control a graphical user interface. For upper-body gesture recognition, features are extracted from skeletal data. The extracted features consist of joint angles and relative joint positions and are extracted for each frame of the gesture sequence. A novel key-frame selection algorithm is used to align the gesture sequences temporally. A neural network and hidden Markov model are then used to classify the gestures. The framework was tested on three different datasets, the CMU Military dataset of 3 users, 15 gestures and 10 repetitions per gesture, the VisApp2013 dataset with 28 users, 8 gestures and 1 repetition/gesture and a recorded dataset of 15 users, 10 gestures and 3 repetitions per gesture. The system is shown to achieve a recognition rate of 100% across the three different datasets, using the key-frame selection and a neural network for gesture identification. Static hand-gesture recognition is achieved by first retrieving the 24-DOF hand model. The hand is segmented from the image using both depth and colour information. A novel calibration method is then used to automatically obtain the anthropometric measurements of the user’s hand. The k-curvature algorithm, depth-based and parallel border-based methods are used to detect fingertips in the image. An average detection accuracy of 88% is achieved. A neural network and k-means classifier are then used to classify the static hand gestures. The framework was tested on a dataset of 15 users, 12 gestures and 3 repetitions per gesture. A correct classification rate of 75% is achieved using the neural network. It is shown that the proposed system is robust to changes in skin colour and user hand size

    Intelligent ultrasound hand gesture recognition system

    Get PDF
    With the booming development of technology, hand gesture recognition has become a hotspot in Human-Computer Interaction (HCI) systems. Ultrasound hand gesture recognition is an innovative method that has attracted ample interest due to its strong real-time performance, low cost, large field of view, and illumination independence. Well-investigated HCI applications include external digital pens, game controllers on smart mobile devices, and web browser control on laptops. This thesis probes gesture recognition systems on multiple platforms to study the behavior of system performance with various gesture features. Focused on this topic, the contributions of this thesis can be summarized from the perspectives of smartphone acoustic field and hand model simulation, real-time gesture recognition on smart devices with speed categorization algorithm, fast reaction gesture recognition based on temporal neural networks, and angle of arrival-based gesture recognition system. Firstly, a novel pressure-acoustic simulation model is developed to examine its potential for use in acoustic gesture recognition. The simulation model is creating a new system for acoustic verification, which uses simulations mimicking real-world sound elements to replicate a sound pressure environment as authentically as possible. This system is fine-tuned through sensitivity tests within the simulation and validate with real-world measurements. Following this, the study constructs novel simulations for acoustic applications, informed by the verified acoustic field distribution, to assess their effectiveness in specific devices. Furthermore, a simulation focused on understanding the effects of the placement of sound devices and hand-reflected sound waves is properly designed. Moreover, a feasibility test on phase control modification is conducted, revealing the practical applications and boundaries of this model. Mobility and system accuracy are two significant factors that determine gesture recognition performance. As smartphones have high-quality acoustic devices for developing gesture recognition, to achieve a portable gesture recognition system with high accuracy, novel algorithms were developed to distinguish gestures using smartphone built-in speakers and microphones. The proposed system adopts Short-Time-Fourier-Transform (STFT) and machine learning to capture hand movement and determine gestures by the pretrained neural network. To differentiate gesture speeds, a specific neural network was designed and set as part of the classification algorithm. The final accuracy rate achieves 96% among nine gestures and three speed levels. The proposed algorithms were evaluated comparatively through algorithm comparison, and the accuracy outperformed state-of-the-art systems. Furthermore, a fast reaction gesture recognition based on temporal neural networks was designed. Traditional ultrasound gesture recognition adopts convolutional neural networks that have flaws in terms of response time and discontinuous operation. Besides, overlap intervals in network processing cause cross-frame failures that greatly reduce system performance. To mitigate these problems, a novel fast reaction gesture recognition system that slices signals in short time intervals was designed. The proposed system adopted a novel convolutional recurrent neural network (CRNN) that calculates gesture features in a short time and combines features over time. The results showed the reaction time significantly reduced from 1s to 0.2s, and accuracy improved to 100% for six gestures. Lastly, an acoustic sensor array was built to investigate the angle information of performed gestures. The direction of a gesture is a significant feature for gesture classification, which enables the same gesture in different directions to represent different actions. Previous studies mainly focused on types of gestures and analyzing approaches (e.g., Doppler Effect and channel impulse response, etc.), while the direction of gestures was not extensively studied. An acoustic gesture recognition system based on both speed information and gesture direction was developed. The system achieved 94.9% accuracy among ten different gestures from two directions. The proposed system was evaluated comparatively through numerical neural network structures, and the results confirmed that incorporating additional angle information improved the system's performance. In summary, the work presented in this thesis validates the feasibility of recognizing hand gestures using remote ultrasonic sensing across multiple platforms. The acoustic simulation explores the smartphone acoustic field distribution and response results in the context of hand gesture recognition applications. The smartphone gesture recognition system demonstrates the accuracy of recognition through ultrasound signals and conducts an analysis of classification speed. The fast reaction system proposes a more optimized solution to address the cross-frame issue using temporal neural networks, reducing the response latency to 0.2s. The speed and angle-based system provides an additional feature for gesture recognition. The established work will accelerate the development of intelligent hand gesture recognition, enrich the available gesture features, and contribute to further research in various gestures and application scenarios

    FLGR: Fixed Length Gists Representation Learning for RNN-HMM Hybrid-Based Neuromorphic Continuous Gesture Recognition

    Get PDF
    A neuromorphic vision sensors is a novel passive sensing modality and frameless sensors with several advantages over conventional cameras. Frame-based cameras have an average frame-rate of 30 fps, causing motion blur when capturing fast motion, e.g., hand gesture. Rather than wastefully sending entire images at a fixed frame rate, neuromorphic vision sensors only transmit the local pixel-level changes induced by the movement in a scene when they occur. This leads to advantageous characteristics, including low energy consumption, high dynamic range, a sparse event stream and low response latency. In this study, a novel representation learning method was proposed: Fixed Length Gists Representation (FLGR) learning for event-based gesture recognition. Previous methods accumulate events into video frames in a time duration (e.g., 30 ms) to make the accumulated image-level representation. However, the accumulated-frame-based representation waives the friendly event-driven paradigm of neuromorphic vision sensor. New representation are urgently needed to fill the gap in non-accumulated-frame-based representation and exploit the further capabilities of neuromorphic vision. The proposed FLGR is a sequence learned from mixture density autoencoder and preserves the nature of event-based data better. FLGR has a data format of fixed length, and it is easy to feed to sequence classifier. Moreover, an RNN-HMM hybrid was proposed to address the continuous gesture recognition problem. Recurrent neural network (RNN) was applied for FLGR sequence classification while hidden Markov model (HMM) is employed for localizing the candidate gesture and improving the result in a continuous sequence. A neuromorphic continuous hand gestures dataset (Neuro ConGD Dataset) was developed with 17 hand gestures classes for the community of the neuromorphic research. Hopefully, FLGR can inspire the study on the event-based highly efficient, high-speed, and high-dynamic-range sequence classification tasks

    HGR-Net: A Fusion Network for Hand Gesture Segmentation and Recognition

    Full text link
    We propose a two-stage convolutional neural network (CNN) architecture for robust recognition of hand gestures, called HGR-Net, where the first stage performs accurate semantic segmentation to determine hand regions, and the second stage identifies the gesture. The segmentation stage architecture is based on the combination of fully convolutional residual network and atrous spatial pyramid pooling. Although the segmentation sub-network is trained without depth information, it is particularly robust against challenges such as illumination variations and complex backgrounds. The recognition stage deploys a two-stream CNN, which fuses the information from the red-green-blue and segmented images by combining their deep representations in a fully connected layer before classification. Extensive experiments on public datasets show that our architecture achieves almost as good as state-of-the-art performance in segmentation and recognition of static hand gestures, at a fraction of training time, run time, and model size. Our method can operate at an average of 23 ms per frame

    Hand gesture recognition in uncontrolled environments

    Get PDF
    Human Computer Interaction has been relying on mechanical devices to feed information into computers with low efficiency for a long time. With the recent developments in image processing and machine learning methods, the computer vision community is ready to develop the next generation of Human Computer Interaction methods, including Hand Gesture Recognition methods. A comprehensive Hand Gesture Recognition based semantic level Human Computer Interaction framework for uncontrolled environments is proposed in this thesis. The framework contains novel methods for Hand Posture Recognition, Hand Gesture Recognition and Hand Gesture Spotting. The Hand Posture Recognition method in the proposed framework is capable of recognising predefined still hand postures from cluttered backgrounds. Texture features are used in conjunction with Adaptive Boosting to form a novel feature selection scheme, which can effectively detect and select discriminative texture features from the training samples of the posture classes. A novel Hand Tracking method called Adaptive SURF Tracking is proposed in this thesis. Texture key points are used to track multiple hand candidates in the scene. This tracking method matches texture key points of hand candidates within adjacent frames to calculate the movement directions of hand candidates. With the gesture trajectories provided by the Adaptive SURF Tracking method, a novel classi�er called Partition Matrix is introduced to perform gesture classification for uncontrolled environments with multiple hand candidates. The trajectories of all hand candidates extracted from the original video under different frame rates are used to analyse the movements of hand candidates. An alternative gesture classifier based on Convolutional Neural Network is also proposed. The input images of the Neural Network are approximate trajectory images reconstructed from the tracking results of the Adaptive SURF Tracking method. For Hand Gesture Spotting, a forward spotting scheme is introduced to detect the starting and ending points of the prede�ned gestures in the continuously signed gesture videos. A Non-Sign Model is also proposed to simulate meaningless hand movements between the meaningful gestures. The proposed framework can perform well with unconstrained scene settings, including frontal occlusions, background distractions and changing lighting conditions. Moreover, it is invariant to changing scales, speed and locations of the gesture trajectories

    Deep Learning Methods for Hand Gesture Recognition via High-Density Surface Electromyogram (HD-sEMG) Signals

    Get PDF
    Hand Gesture Recognition (HGR) using surface Electromyogram (sEMG) signals can be considered as one of the most important technologies in making efficient Human Machine Interface (HMI) systems. In particular, sEMG-based hand gesture has been a topic of growing interest for development of assistive systems to improve the quality of life in individuals suffering from amputated limbs. Generally speaking, myoelectric prosthetic devices work by classifying existing patterns of the collected sEMG signals and synthesizing intended gestures. While conventional myoelectric control systems, e.g., on/off control or direct-proportional, have potential advantages, challenges such as limited Degree of Freedom (DoF) due to crosstalk have resulted in the emergence of data-driven solutions. More specifically, to improve efficiency, intuitiveness, and the control performance of hand prosthetic systems, several Artificial Intelligence (AI) algorithms ranging from conventional Machine Learning (ML) models to highly complicated Deep Neural Network (DNN) architectures have been designed for sEMG-based hand gesture recognition in myoelectric prosthetic devices. In this thesis, we, first, perform a literature review on hand gesture recognition methods and elaborate on the recently proposed Deep Learning/Machine Learning (DL/ML) models in the literature. Then, our utilized High-Density sEMG (HD-sEMG) dataset is introduced and the rationales behind our main focus on this particular type of sEMG dataset are explained. We, then, develop a Vision Transformer (ViT)-based model for gesture recognition with HD-sEMG signals and evaluate its performance under different conditions such as variable window sizes, number of electrode channels, and model's complexity. We compare its performance with that of two conventional ML and one DL algorithm that are typically adopted in this domain. Furthermore, we introduce another capability of our proposed framework for instantaneous training, which is its ability to classify hand gestures based on a single frame of HD-sEMG dataset. Following that, we introduce the idea of integrating the macroscopic and microscopic neural drive information obtained from HD-sEMG data into a hybrid ViT-based framework for gesture recognition, which outperforms a standalone ViT architecture in terms of classification accuracy. Here, microscopic neural drive information (also called Motor Unit Spike Trains) refers to the neural commands sent by the brain and spinal cord to individual muscle fibers and are extracted from HD-sEMG signals using Blind Source Separation (BSP) algorithms. Finally, we design an alternative and novel hand gesture recognition model based on the less-explored topic of Spiking Neural Networks (SNN), which performs spatio-temporal gesture recognition in an event-based fashion. As opposed to the classical DNN architectures, SNNs are of the capacity to imitate human brain's cognitive function by using biologically inspired models of neurons and synapses. Therefore, they are more biologically explainable and computationally efficient

    Smart Glove and Hand Gesture-Based Control Interface for Multi-Rotor Aerial Vehicles

    Get PDF
    This thesis introduces two types of adaptable human–computer interaction methods in single-subject and multi-subject unsupervised environments. In a single-subject environment, a single shot multi-box detector (SSD) is used to detect the hand's regions of interest (RoI) in the input frame. The RoI detected by the SSD model is fed to a convolutional neural network (CNN) to classify the right-hand gesture. In a crowded environment, a region-based convolutional neural network (R-CNN) detects subjects in the frame and the RoI of their faces. These regions of interest are then fed to a facial recognition module to find the main subject in the frame. After the main subject is found, the R-CNN model feeds the right-hand RoI of the main subject to the CNN to classify the right-hand gesture. In both single-subject and multi-subject environments, a fixed number of gestures is used to describe specific commands from the right-hand gestures to the vehicle (take off, land, hover, etc.). A smart glove on the left hand is used for more precise vehicular control. A motion-processing unit (MPU) and a set of four flex sensors are used in the smart glove to produce discrete and continuous signals based on the bending value of each finger and the roll angle of the left hand. The generated discrete signals with the flex sensors are fed to a support vector machine (SVM) to classify the left-hand gesture, and the generated continuous signals with the flex sensors and the MPU module are used to determine some characteristics of the gesture command such as throttle and angle of the vehicle. Three simultaneous validation layers have been implemented, including (1) Human-Based Validation, where the main subject can reject the robot’s behavior by sending a command through the smart glove, (2) Classification Validation, where the main subject can retrain the classifier classes at will, and (3) System Validation, which is responsible for finding the error's source when the main subject performs a specific command through the smart glove. The proposed algorithm is applied to groups consisting of one, two, three, and four subjects including the main subject to validate and investigate the behavior of the algorithm in various desirable and undesirable situations. The proposed algorithm is implemented on an Nvidia Jetson AGX Xavier GPU

    Continuous Interaction With a Smart Speaker via Low-Dimensional Embeddings of Dynamic Hand Pose

    Get PDF
    This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose
    • …
    corecore