Hand gesture recognition for multimedia applications

Abstract

Hand gesture is potentially a very natural and useful modality for human-machine interaction. It is considered to be one of the most complicated and interesting challenges in computer vision due to its articulated structure and environmental variations. Solving such challenges requires robust hand detection, feature description, and viewpoint invariant classification. This thesis introduces several steps to tackle these challenges and applies them in a hand-gesture-based application (a game) to demonstrate the proposed approach. Techniques on new feature description, hand gesture detection and viewpoint invariant methods are explored and evaluated. A normal webcam is used in the research as input device. Hands are segmented using pre-trained skin colour models and tracked using the CAMShift tracker. Moment invariants are used as a shape descriptor. A new approach utilising the Zernike Velocity Moments (ZVMs, first introduced by Shutler and Nixon [1,2]), is examined on hand gestures. Results obtained using the ZVMs as spatial-temporal descriptor are compared to an HMM with Zemike moments (ZMs). Manually isolated hand gestures are used as input to the ZVM descriptor which generates vectors of features that are classified using a regression classifier. The performance of ZVM is evaluated using isolated, user-independent and user-dependent data. Isolating (segmenting) the gesture manually from a video stream for gesture recognition is a research proposition only and real life scenarios require an automatic hand gesture detection mechanism. Two methods for detecting gestures are examined. Firstly, hand gesture detection is performed using a sliding window which segments sequences of frames and then evaluates them against pre-trained HMMs. Secondly, the set of class-specific HMMs is combined into a single HMM and the Viterbi algorithm is then used to find the optimal sequence of gestures. Finally, the thesis proposes a flexible application that provides the user with options to perform the gesture from different viewpoints. A usable hand gesture recognition system should be able to cope with such viewpoint variations. To solve this problem, a new approach is introduced which makes use of 3D models of hand gestures (not postures) for generating projections. A virtual arm with 3D models of real hands is created. After that, virtual movements of the hand are simulated using animation software and projected from different viewpoints. Using a multi-Gaussian HMM, the system is trained on the projected sequences. Each set of hand gesture projections is marked with its specific class and used to train the single multi-class HMNI with gestures across different viewpoints

Similar works

This paper was published in White Rose E-theses Online.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.