992 research outputs found

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Heterogeneous hand gesture recognition using 3D dynamic skeletal data

    Get PDF
    International audienceHand gestures are the most natural and intuitive non-verbal communication medium while interacting with a computer, and related research efforts have recently boosted interest. Additionally, the identifiable features of the hand pose provided by current commercial inexpensive depth cameras can be exploited in various gesture recognition based systems, especially for Human-Computer Interaction. In this paper, we focus our attention on 3D dynamic gesture recognition systems using the hand pose information. Specifically, we use the natural structure of the hand topology-called later hand skeletal data-to extract effective hand kinematic descriptors from the gesture sequence. Descriptors are then encoded in a statistical and temporal representation using respectively a Fisher kernel and a multi-level temporal pyramid. A linear SVM classifier can be applied directly on the feature vector computed over the whole presegmented gesture to perform the recognition. Furthermore, for early recognition from continuous stream, we introduced a prior gesture detection phase achieved using a binary classifier before the final gesture recognition. The proposed approach is evaluated on three hand gesture datasets containing respectively 10, 14 and 25 gestures with specific challenging tasks. Also, we conduct an experiment to assess the influence of depth-based hand pose estimation on our approach. Experimental results demonstrate the potential of the proposed solution in terms of hand gesture recognition and also for a low-latency gesture recognition. Comparative results with state-of-the-art methods are reported

    A 3D descriptor to detect task-oriented grasping points in clothing

    Get PDF
    © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Manipulating textile objects with a robot is a challenging task, especially because the garment perception is difficult due to the endless configurations it can adopt, coupled with a large variety of colors and designs. Most current approaches follow a multiple re-grasp strategy, in which clothes are sequentially grasped from different points until one of them yields a recognizable configuration. In this work we propose a method that combines 3D and appearance information to directly select a suitable grasping point for the task at hand, which in our case consists of hanging a shirt or a polo shirt from a hook. Our method follows a coarse-to-fine approach in which, first, the collar of the garment is detected and, next, a grasping point on the lapel is chosen using a novel 3D descriptor. In contrast to current 3D descriptors, ours can run in real time, even when it needs to be densely computed over the input image. Our central idea is to take advantage of the structured nature of range images that most depth sensors provide and, by exploiting integral imaging, achieve speed-ups of two orders of magnitude with respect to competing approaches, while maintaining performance. This makes it especially adequate for robotic applications as we thoroughly demonstrate in the experimental section.Peer ReviewedPostprint (author's final draft

    Performance Improvement of Data Fusion Based Real-Time Hand Gesture Recognition by Using 3-D Convolution Neural Networks With Kinect V2

    Get PDF
    Hand gesture recognition is one of the most active areas of research in computer vision. It provides an easy way to interact with a machine without using any extra devices. Hand gestures are natural and intuitive communication way for the human being to interact with his environment. In this paper, we propose Data Fusion Based Real-Time Hand Gesture Recognition using 3-D Convolutional Neural Networks and Kinect V2. To achieve the accurate segmentation and tracking with Kinect V2. Convolution neural network to improve the validity and robustness of the system. Based on the experimental results, the proposed model is accurate, robust and performance with very low processor utilization. The performance of our proposed system in real life application, which is controlling various devices using Kinect V2. Keywords: Hand gesture recognition, Kinect V2, data fusion, Convolutional Neural Networks DOI: 10.7176/IKM/9-1-02

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF
    corecore