30 research outputs found

    3D hand tracking.

    Get PDF
    The hand is often considered as one of the most natural and intuitive interaction modalities for human-to-human interaction. In human-computer interaction (HCI), proper 3D hand tracking is the first step in developing a more intuitive HCI system which can be used in applications such as gesture recognition, virtual object manipulation and gaming. However, accurate 3D hand tracking, remains a challenging problem due to the hand’s deformation, appearance similarity, high inter-finger occlusion and complex articulated motion. Further, 3D hand tracking is also interesting from a theoretical point of view as it deals with three major areas of computer vision- segmentation (of hand), detection (of hand parts), and tracking (of hand). This thesis proposes a region-based skin color detection technique, a model-based and an appearance-based 3D hand tracking techniques to bring the human-computer interaction applications one step closer. All techniques are briefly described below. Skin color provides a powerful cue for complex computer vision applications. Although skin color detection has been an active research area for decades, the mainstream technology is based on individual pixels. This thesis presents a new region-based technique for skin color detection which outperforms the current state-of-the-art pixel-based skin color detection technique on the popular Compaq dataset (Jones & Rehg 2002). The proposed technique achieves 91.17% true positive rate with 13.12% false negative rate on the Compaq dataset tested over approximately 14,000 web images. Hand tracking is not a trivial task as it requires tracking of 27 degreesof- freedom of hand. Hand deformation, self occlusion, appearance similarity and irregular motion are major problems that make 3D hand tracking a very challenging task. This thesis proposes a model-based 3D hand tracking technique, which is improved by using proposed depth-foreground-background ii feature, palm deformation module and context cue. However, the major problem of model-based techniques is, they are computationally expensive. This can be overcome by discriminative techniques as described below. Discriminative techniques (for example random forest) are good for hand part detection, however they fail due to sensor noise and high interfinger occlusion. Additionally, these techniques have difficulties in modelling kinematic or temporal constraints. Although model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This thesis presents a unified framework for 3D hand tracking, using the best of both methodologies, which out performs the current state-of-the-art 3D hand tracking techniques. The proposed 3D hand tracking techniques in this thesis can be used to extract accurate hand movement features and enable complex human machine interaction such as gaming and virtual object manipulation

    Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking

    Get PDF
    This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization

    Hand shape estimation for South African sign language

    Get PDF
    >Magister Scientiae - MScHand shape recognition is a pivotal part of any system that attempts to implement Sign Language recognition. This thesis presents a novel system which recognises hand shapes from a single camera view in 2D. By mapping the recognised hand shape from 2D to 3D,it is possible to obtain 3D co-ordinates for each of the joints within the hand using the kinematics embedded in a 3D hand avatar and smooth the transformation in 3D space between any given hand shapes. The novelty in this system is that it does not require a hand pose to be recognised at every frame, but rather that hand shapes be detected at a given step size. This architecture allows for a more efficient system with better accuracy than other related systems. Moreover, a real-time hand tracking strategy was developed that works efficiently for any skin tone and a complex background

    Independent hand-tracking from a single two-dimensional view and its application to South African sign language recognition

    Get PDF
    Philosophiae Doctor - PhDHand motion provides a natural way of interaction that allows humans to interact not only with the environment, but also with each other. The effectiveness and accuracy of hand-tracking is fundamental to the recognition of sign language. Any inconsistencies in hand-tracking result in a breakdown in sign language communication. Hands are articulated objects, which complicates the tracking thereof. In sign language communication the tracking of hands is often challenged by the occlusion of the other hand, other body parts and the environment in which they are being tracked. The thesis investigates whether a single framework can be developed to track the hands independently of an individual from a single 2D camera in constrained and unconstrained environments without the need for any special device. The framework consists of a three-phase strategy, namely, detection, tracking and learning phases. The detection phase validates whether the object being tracked is a hand, using extended local binary patterns and random forests. The tracking phase tracks the hands independently by extending a novel data-association technique. The learning phase exploits contextual features, using the scale-invariant features transform (SIFT) algorithm and the fast library for approximate nearest neighbours (FLANN) algorithm to assist tracking and the recovering of hands from any form of tracking failure. The framework was evaluated on South African sign language phrases that use a single hand, both hands without occlusion, and both hands with occlusion. These phrases were performed by 20 individuals in constrained and unconstrained environments. The experiments revealed that integrating all three phases to form a single framework is suitable for tracking hands in both constrained and unconstrained environments, where a high average accuracy of 82,08% and 79,83% was achieved respectively

    Visual tracking of highly articulated objects using massively parallel processors

    Get PDF
    Hand gesture recognition has the potential of simplifying human computer interactions. However, the human hand is a highly articulated object, capable of taking on many different appearances. In this work, we consider an analysis by synthesis approach to this difficult tracking problem. We attempt to overcome the vast amount of computation required by implementing the algorithm on commodity GPUs. We also collect a lengthy sequence of hand motions from five cameras in order to train and test our algorithm. We show that to achieve good tracking performance, it is important to understand the way that the hand moves. It is of secondary importance to have a good estimate of the hand shape and to be able to process the frames as quickly as possible. Under heavily controlled circumstances, we are able to achieve full tracking accuracy

    Articulation estimation and real-time tracking of human hand motions

    Get PDF
    Schröder M. Articulation estimation and real-time tracking of human hand motions. Bielefeld: Universität Bielefeld; 2015.This thesis deals with the problem of estimating and tracking the full articulation of human hands. Algorithmically recovering hand articulations is a challenging problem due to the hand’s high number of degrees of freedom and the complexity of its motions. Besides the accuracy and efficiency of the hand posture estimation, hand tracking methods are faced with issues such as invasiveness, ease of deployment and sensor artifacts. In this thesis several different hand tracking approaches are examined, including marker-based optical motion capture, data-driven discriminative visual tracking and generative tracking based on articulated registration, and various contributions to these areas are presented. The problem of optimally placing reduced marker sets on a performer’s hand for optical hand motion capture is explored. A method is proposed that automatically generates functional reduced marker layouts by optimizing for their numerical stability and geometric feasibility. A data-driven discriminative tracking approach based on matching the hand’s appearance in the sensor data with an image database is investigated. In addition to an efficient nearest neighbor search for images, a combination of discriminative initialization and generative refinement is employed. The method’s applicability is demonstrated in interactive robot teleoperation. Various real human hand motions are captured and statistically analyzed to derive low-dimensional representations of hand articulations. An adaptive hand posture subspace concept is developed and integrated into a generative real-time hand tracking approach that aligns a virtual hand model with sensor point clouds based on constrained inverse kinematics. Generative hand tracking is formulated as a regularized articulated registration process, in which geometrical model fitting is combined with statistical, kinematic and temporal regularization priors. A registration concept that combines 2D and 3D alignment and explicitly accounts for occlusions and visibility constraints is devised. High-quality, non-invasive, real-time hand tracking is achieved based on this regularized articulated registration formulation

    Egocentric Perception of Hands and Its Applications

    Get PDF

    Algorithms and evaluation for object detection and tracking in computer vision

    Get PDF
    Vision-based object detection and tracking, especially for video surveillance applications, is studied from algorithms to performance evaluation. This dissertation is composed of four topics: (1) Background Modeling and Detection, (2) Performance Evaluation of Sensitive Target Detection, (3) Multi-view Multi-target Multi-Hypothesis Segmentation and Tracking of People, and (4) A Fine-Structure Image/Video Quality Measure. First, we present a real-time algorithm for foreground-background segmentation. It allows us to capture structural background variation due to periodic-like motion over a long period of time under limited memory. Our codebook-based representation is efficient in memory and speed compared with other background modeling techniques. Our method can handle scenes containing moving backgrounds or illumination variations, and it achieves robust detection for different types of videos. In addition to the basic algorithm, three features improving the algorithm are presented - Automatic Parameter Estimation, Layered Modeling/Detection and Adaptive Codebook Updating. Second, we introduce a performance evaluation methodology called Perturbation Detection Rate (PDR) analysis for measuring performance of foreground-background segmentation. It does not require foreground targets or knowledge of foreground distributions. It measures the sensitivity of a background subtraction algorithm in detecting possible low contrast targets against the background as a function of contrast. We compare four background subtraction algorithms using the methodology. Third, a multi-view multi-hypothesis approach to segmenting and tracking multiple persons on a ground plane is proposed. The tracking state space is the set of ground points of the people being tracked. During tracking, several iterations of segmentation are performed using information from human appearance models and ground plane homography. Two innovations are made in this chapter - (1) To more precisely locate the ground location of a person, all center vertical axes of the person across views are mapped to the top-view plane to find the intersection point. (2) To tackle the explosive state space due to multiple targets and views, iterative segmentation-searching is incorporated into a particle filtering framework. By searching for people's ground point locations from segmentations, a set of a few good particles can be identified, resulting in low computational cost. In addition, even if all the particles are away from the true ground point, some of them move towards the true one through the iterated process as long as they are located nearby. Finally, an objective no-reference measure is presented to assess fine-structure image/video quality. The proposed measure using local statistics reflects image degradation well in terms of noise and blur

    Autonomous model building using vision and manipulation

    Get PDF
    It is often the case that robotic systems require models, in order to successfully control themselves, and to interact with the world. Models take many forms and include kinematic models to plan motions, dynamics models to understand the interaction of forces, and models of 3D geometry to check for collisions, to name but a few. Traditionally, models are provided to the robotic system by the designers that build the system. However, for long-term autonomy it becomes important for the robot to be able to build and maintain models of itself, and of objects it might encounter. In this thesis, the argument for enabling robotic systems to autonomously build models is advanced and explored. The main contribution of this research is to show how a layered approach can be taken to building models. Thus a robot, starting with a limited amount of information, can autonomously build a number of models, including a kinematic model, which describes the robot’s body, and allows it to plan and perform future movements. Key to the incremental, autonomous approach is the use of exploratory actions. These are actions that the robot can perform in order to gain some more information, either about itself, or about an object with which it is interacting. A method is then presented whereby a robot, after being powered on, can home its joints using just vision, i.e. traditional methods such as absolute encoders, or limit switches are not required. The ability to interact with objects in order to extract information is one of the main advantages that a robotic system has over a purely passive system, when attempting to learn about or build models of objects. In light of this, the next contribution of this research is to look beyond the robot’s body and to present methods with which a robot can autonomously build models of objects in the world around it. The first class of objects examined are flat pack cardboard boxes, a class of articulated objects with a number of interesting properties. It is shown how exploratory actions can be used to build a model of a flat pack cardboard box and to locate any hinges the box may have. Specifically, it is shown how when interacting with an object, a robot can combine haptic feedback from force sensors, with visual feedback from a camera to get more information from an object than would be possible using just a single sensor modality. The final contribution of this research is to present a series of exploratory actions for a robotic text reading system that allow text to be found and read from an object. The text reading system highlights how models of objects can take many forms, from a representation of their physical extents, to the text that is written on them
    corecore