8,077 research outputs found
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
An evaluation of depth camera-based hand pose recognition for virtual reality systems.
Masters Degree. University of KwaZulu-Natal, Durban.Camera-based hand gesture recognition for interaction in virtual reality systems
promises to provide a more immersive and less distracting means of input than the
usual hand-held controllers. It is unknown if a camera would effectively distinguish
hand poses made in a virtual reality environment, due to lack of research in this area.
This research explores and measures the effectiveness of static hand pose input with a
depth camera, specifically the Leap Motion controller, for user interaction in virtual
reality applications. A pose set was derived by analyzing existing gesture taxonomies
and Leap Motion controller-based virtual reality applications, and a dataset of these
poses was constructed using data captured by twenty-five participants. Experiments
on the dataset utilizing three popular machine learning classifiers were not able to
classify the poses with a high enough accuracy, primarily due to occlusion issues affecting
the input data. Therefore, a significantly smaller subset was empirically derived
using a novel algorithm, which utilized a confusion matrix from the machine learning
experiments as well as a table of Hamming Distances between poses. This improved
the recognition accuracy to above 99%, making this set more suitable for real-world
use. It is concluded that while camera-based pose recognition can be reliable on a
small set of poses, finger occlusion hinders the use of larger sets. Thus, alternative
approaches, such as multiple input cameras, should be explored as a potential solution
to the occlusion problem
Gesture recognition through angle space
As the notion of ubiquitous computing becomes a reality, the keyboard and mouse paradigm become less satisfactory as an input modality. The ability to interpret gestures can open another dimension in the user interface technology. In this paper, we present a novel approach for dynamic hand gesture modeling using neural networks. The results show high accuracy in detecting single and multiple gestures, which makes this a promising approach for gesture recognition from continuous input with undetermined boundaries. This method is independent of the input device and can be applied as a general back-end processor for gesture recognition systems
Human gesture classification by brute-force machine learning for exergaming in physiotherapy
In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods
- …