8,077 research outputs found

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    An evaluation of depth camera-based hand pose recognition for virtual reality systems.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.Camera-based hand gesture recognition for interaction in virtual reality systems promises to provide a more immersive and less distracting means of input than the usual hand-held controllers. It is unknown if a camera would effectively distinguish hand poses made in a virtual reality environment, due to lack of research in this area. This research explores and measures the effectiveness of static hand pose input with a depth camera, specifically the Leap Motion controller, for user interaction in virtual reality applications. A pose set was derived by analyzing existing gesture taxonomies and Leap Motion controller-based virtual reality applications, and a dataset of these poses was constructed using data captured by twenty-five participants. Experiments on the dataset utilizing three popular machine learning classifiers were not able to classify the poses with a high enough accuracy, primarily due to occlusion issues affecting the input data. Therefore, a significantly smaller subset was empirically derived using a novel algorithm, which utilized a confusion matrix from the machine learning experiments as well as a table of Hamming Distances between poses. This improved the recognition accuracy to above 99%, making this set more suitable for real-world use. It is concluded that while camera-based pose recognition can be reliable on a small set of poses, finger occlusion hinders the use of larger sets. Thus, alternative approaches, such as multiple input cameras, should be explored as a potential solution to the occlusion problem

    Gesture recognition through angle space

    Get PDF
    As the notion of ubiquitous computing becomes a reality, the keyboard and mouse paradigm become less satisfactory as an input modality. The ability to interpret gestures can open another dimension in the user interface technology. In this paper, we present a novel approach for dynamic hand gesture modeling using neural networks. The results show high accuracy in detecting single and multiple gestures, which makes this a promising approach for gesture recognition from continuous input with undetermined boundaries. This method is independent of the input device and can be applied as a general back-end processor for gesture recognition systems

    Human gesture classification by brute-force machine learning for exergaming in physiotherapy

    Get PDF
    In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods
    • …
    corecore