75,573 research outputs found

    Heterogeneous hand gesture recognition using 3D dynamic skeletal data

    Get PDF
    International audienceHand gestures are the most natural and intuitive non-verbal communication medium while interacting with a computer, and related research efforts have recently boosted interest. Additionally, the identifiable features of the hand pose provided by current commercial inexpensive depth cameras can be exploited in various gesture recognition based systems, especially for Human-Computer Interaction. In this paper, we focus our attention on 3D dynamic gesture recognition systems using the hand pose information. Specifically, we use the natural structure of the hand topology-called later hand skeletal data-to extract effective hand kinematic descriptors from the gesture sequence. Descriptors are then encoded in a statistical and temporal representation using respectively a Fisher kernel and a multi-level temporal pyramid. A linear SVM classifier can be applied directly on the feature vector computed over the whole presegmented gesture to perform the recognition. Furthermore, for early recognition from continuous stream, we introduced a prior gesture detection phase achieved using a binary classifier before the final gesture recognition. The proposed approach is evaluated on three hand gesture datasets containing respectively 10, 14 and 25 gestures with specific challenging tasks. Also, we conduct an experiment to assess the influence of depth-based hand pose estimation on our approach. Experimental results demonstrate the potential of the proposed solution in terms of hand gesture recognition and also for a low-latency gesture recognition. Comparative results with state-of-the-art methods are reported

    Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition

    Get PDF
    This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatiotemporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data

    Gesture passwords: concepts, methods and challenges

    Full text link
    Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures. A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric. This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods. A byproduct of this work is the creation and release of multiple publicly available, user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures

    Human-computer interaction based on hand gestures using RGB-D sensors

    Get PDF
    In this paper we present a new method for hand gesture recognition based on an RGB-D sensor. The proposed approach takes advantage of depth information to cope with the most common problems of traditional video-based hand segmentation methods: cluttered backgrounds and occlusions. The algorithm also uses colour and semantic information to accurately identify any number of hands present in the image. Ten different static hand gestures are recognised, including all different combinations of spread fingers. Additionally, movements of an open hand are followed and 6 dynamic gestures are identified. The main advantage of our approach is the freedom of the user’s hands to be at any position of the image without the need of wearing any specific clothing or additional devices. Besides, the whole method can be executed without any initial training or calibration. Experiments carried out with different users and in different environments prove the accuracy and robustness of the method which, additionally, can be run in real-time

    Gesture Recognition for Human-Robot Interaction for Service Robots

    Get PDF
    Robots are quickly becoming an intrinsic part of our daily lives and it is becoming important to provide the users a simple and intuitive way to interact with them. In this thesis, we present a multimodal Human-Robot interface for an existing service robot, mostly addressed to people with reduced mobility on the shopping process in dynamic and crowded environments ( eg. supermarkets). This interface was created in order to recognize the "Start", "Stop" and "Pause" commands.The proposed Human-Robot Interface includes two types of interaction: verbal and non-verbal. Regarding verbal interaction, four state of the art implementations (Google Speech Recognition, Houndify, Microsoft Bing Voice Recognition, CMUsphinx) were tested and compared. The Houndify proved to be the more suitable for our project.Relatively to the non-verbal interaction, a novel method for hand gesture recognition based on depth information was implemented and tested. The software was developed to be used by a robot equipped with a RGB-D camera. This camera captures images in real time where the robot user's position is obtained. Taking as input the information already processed by the robot, the arm/hand is obtained by a depth based segmentation approach. A principal component analysis is then computed to each object where its center of mass and eigen vectors are calculated in order to extract the hand's tip and orientation. A Kalman filter is then applied for tracking the hand and get its position through time. Given this information and based on finite state machines that were implemented to describe gestures (start, stop, pause) we perform gesture recognition

    Real-time hand gesture recognition exploiting multiple 2D and 3D cues

    Get PDF
    The recent introduction of several 3D applications and stereoscopic display technologies has created the necessity of novel human-machine interfaces. The traditional input devices, such as keyboard and mouse, are not able to fully exploit the potential of these interfaces and do not offer a natural interaction. Hand gestures provide, instead, a more natural and sometimes safer way of interacting with computers and other machines without touching them. The use cases for gesture-based interfaces range from gaming to automatic sign language interpretation, health care, robotics, and vehicle automation. Automatic gesture recognition is a challenging problem that has been attaining a growing interest in the research field for several years due to its applications in natural interfaces. The first approaches, based on the recognition from 2D color pictures or video only, suffered of the typical problems characterizing such type of data. Inter occlusions, different skin colors among users even of the same ethnic group and unstable illumination conditions, in facts, often made this problem intractable. Other approaches, instead, solved the previous problems by making the user wear sensorized gloves or hold proper tools designed to help the hand localization in the scene. The recent introduction in the mass market of novel low-cost range cameras, like the Microsoft Kinect, Asus XTION, Creative Senz3D, and the Leap Motion, has opened the way to innovative gesture recognition approaches exploiting the geometry of the framed scene. Most methods share a common gesture recognition pipeline based on firstly identifying the hand in the framed scene, then extracting some relevant features on the hand samples and finally exploiting suitable machine learning techniques in order to recognize the performed gesture from a predefined ``gesture dictionary''. This thesis, based on the previous rationale, proposes a novel gesture recognition framework exploiting both color and geometric cues from low-cost color and range cameras. The dissertation starts by introducing the automatic hand gesture recognition problem, giving an overview of the state-of-art algorithms and the recognition pipeline employed in this work. Then, it briefly describes the major low-cost range cameras and setups used in literature for color and depth data acquisition for hand gesture recognition purposes, highlighting their capabilities and limitations. The methods employed for respectively detecting the hand in the framed scene and segmenting it in its relevant parts are then analyzed with a higher level of detail. The algorithm first exploits skin color information and geometrical considerations for discarding the background samples, then it reliably detects the palm and the finger regions, and removes the forearm. For the palm detection, the method fits the largest circle inscribed in the palm region or, in a more advanced version, an ellipse. A set of robust color and geometric features which can be extracted from the fingers and palm regions, previously segmented, is then illustrated accurately. Geometric features describe properties of the hand contour from its curvature variations, the distances in the 3D space or in the image plane of its points from the hand center or from the palm, or extract relevant information from the palm morphology and from the empty space in the hand convex hull. Color features exploit, instead, the histogram of oriented gradients (HOG), local phase quantization (LPQ) and local ternary patterns (LTP) algorithms to provide further helpful cues from the hand texture and the depth map treated as a grayscale image. Additional features extracted from the Leap Motion data complete the gesture characterization for a more reliable recognition. Moreover, the thesis also reports a novel approach jointly exploiting the geometric data provided by the Leap Motion and the depth data from a range camera for extracting the same depth features with a significantly lower computational effort. This work then addresses the delicate problem of constructing a robust gesture recognition model from the features previously described, using multi-class Support Vector Machines, Random Forests or more powerful ensembles of classifiers. Feature selection techniques, designed to detect the smallest subset of features that allow to train a leaner classification model without a significant accuracy loss, are also considered. The proposed recognition method, tested on subsets of the American Sign Language and experimentally validated, reported very high accuracies. The results showed also how higher accuracies are obtainable by combining proper sets of complementary features and using ensembles of classifiers. Moreover, it is worth noticing that the proposed approach is not sensor dependent, that is, the recognition algorithm is not bound to a specific sensor or technology adopted for the depth data acquisition. Eventually, the gesture recognition algorithm is able to run in real-time even in absence of a thorough optimization, and may be easily extended in a near future with novel descriptors and the support for dynamic gestures
    • …
    corecore