204,831 research outputs found

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201

    3D hand tracking.

    Get PDF
    The hand is often considered as one of the most natural and intuitive interaction modalities for human-to-human interaction. In human-computer interaction (HCI), proper 3D hand tracking is the first step in developing a more intuitive HCI system which can be used in applications such as gesture recognition, virtual object manipulation and gaming. However, accurate 3D hand tracking, remains a challenging problem due to the hand’s deformation, appearance similarity, high inter-finger occlusion and complex articulated motion. Further, 3D hand tracking is also interesting from a theoretical point of view as it deals with three major areas of computer vision- segmentation (of hand), detection (of hand parts), and tracking (of hand). This thesis proposes a region-based skin color detection technique, a model-based and an appearance-based 3D hand tracking techniques to bring the human-computer interaction applications one step closer. All techniques are briefly described below. Skin color provides a powerful cue for complex computer vision applications. Although skin color detection has been an active research area for decades, the mainstream technology is based on individual pixels. This thesis presents a new region-based technique for skin color detection which outperforms the current state-of-the-art pixel-based skin color detection technique on the popular Compaq dataset (Jones & Rehg 2002). The proposed technique achieves 91.17% true positive rate with 13.12% false negative rate on the Compaq dataset tested over approximately 14,000 web images. Hand tracking is not a trivial task as it requires tracking of 27 degreesof- freedom of hand. Hand deformation, self occlusion, appearance similarity and irregular motion are major problems that make 3D hand tracking a very challenging task. This thesis proposes a model-based 3D hand tracking technique, which is improved by using proposed depth-foreground-background ii feature, palm deformation module and context cue. However, the major problem of model-based techniques is, they are computationally expensive. This can be overcome by discriminative techniques as described below. Discriminative techniques (for example random forest) are good for hand part detection, however they fail due to sensor noise and high interfinger occlusion. Additionally, these techniques have difficulties in modelling kinematic or temporal constraints. Although model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This thesis presents a unified framework for 3D hand tracking, using the best of both methodologies, which out performs the current state-of-the-art 3D hand tracking techniques. The proposed 3D hand tracking techniques in this thesis can be used to extract accurate hand movement features and enable complex human machine interaction such as gaming and virtual object manipulation

    Efficient illumination independent appearance-based face tracking

    Get PDF
    One of the major challenges that visual tracking algorithms face nowadays is being able to cope with changes in the appearance of the target during tracking. Linear subspace models have been extensively studied and are possibly the most popular way of modelling target appearance. We introduce a linear subspace representation in which the appearance of a face is represented by the addition of two approxi- mately independent linear subspaces modelling facial expressions and illumination respectively. This model is more compact than previous bilinear or multilinear ap- proaches. The independence assumption notably simplifies system training. We only require two image sequences. One facial expression is subject to all possible illumina- tions in one sequence and the face adopts all facial expressions under one particular illumination in the other. This simple model enables us to train the system with no manual intervention. We also revisit the problem of efficiently fitting a linear subspace-based model to a target image and introduce an additive procedure for solving this problem. We prove that Matthews and Baker’s Inverse Compositional Approach makes a smoothness assumption on the subspace basis that is equiva- lent to Hager and Belhumeur’s, which worsens convergence. Our approach differs from Hager and Belhumeur’s additive and Matthews and Baker’s compositional ap- proaches in that we make no smoothness assumptions on the subspace basis. In the experiments conducted we show that the model introduced accurately represents the appearance variations caused by illumination changes and facial expressions. We also verify experimentally that our fitting procedure is more accurate and has better convergence rate than the other related approaches, albeit at the expense of a slight increase in computational cost. Our approach can be used for tracking a human face at standard video frame rates on an average personal computer

    Object Tracking and Mensuration in Surveillance Videos

    Get PDF
    This thesis focuses on tracking and mensuration in surveillance videos. The first part of the thesis discusses several object tracking approaches based on the different properties of tracking targets. For airborne videos, where the targets are usually small and with low resolutions, an approach of building motion models for foreground/background proposed in which the foreground target is simplified as a rigid object. For relatively high resolution targets, the non-rigid models are applied. An active contour-based algorithm has been introduced. The algorithm is based on decomposing the tracking into three parts: estimate the affine transform parameters between successive frames using particle filters; detect the contour deformation using a probabilistic deformation map, and regulate the deformation by projecting the updated model onto a trained shape subspace. The active appearance Markov chain (AAMC). It integrates a statistical model of shape, appearance and motion. In the AAMC model, a Markov chain represents the switching of motion phases (poses), and several pairwise active appearance model (P-AAM) components characterize the shape, appearance and motion information for different motion phases. The second part of the thesis covers video mensuration, in which we have proposed a heightmeasuring algorithm with less human supervision, more flexibility and improved robustness. From videos acquired by an uncalibrated stationary camera, we first recover the vanishing line and the vertical point of the scene. We then apply a single view mensuration algorithm to each of the frames to obtain height measurements. Finally, using the LMedS as the cost function and the Robbins-Monro stochastic approximation (RMSA) technique to obtain the optimal estimate

    Tracking people in crowds by a part matching approach

    Full text link
    The major difficulty in human tracking is the problem raised by challenging occlusions where the target person is repeatedly and extensively occluded by either the background or another moving object. These types of occlusions may cause significant changes in the person's shape, appearance or motion, thus making the data association problem extremely difficult to solve. Unlike most of the existing methods for human tracking that handle occlusions by data association of the complete human body, in this paper we propose a method that tracks people under challenging spatial occlusions based on body part tracking. The human model we propose consists of five body parts with six degrees of freedom and each part is represented by a rich set of features. The tracking is solved using a layered data association approach, direct comparison between features (feature layer) and subsequently matching between parts of the same bodies (part layer) lead to a final decision for the global match (global layer). Experimental results have confirmed the effectiveness of the proposed method. © 2008 IEEE

    Hierarchical eyelid and face tracking

    Get PDF
    Most applications on Human Computer Interaction (HCI) require to extract the movements of user faces, while avoiding high memory and time expenses. Moreover, HCI systems usually use low-cost cameras, while current face tracking techniques strongly depend on the image resolution. In this paper, we tackle the problem of eyelid tracking by using Appearance-Based Models, thus achieving accurate estimations of the movements of the eyelids, while avoiding cues, which require high-resolution faces, such as edge detectors or colour information. Consequently, we can track the fast and spontaneous movements of the eyelids, a very hard task due to the small resolution of the eye regions. Subsequently, we combine the results of eyelid tracking with the estimations of other facial features, such as the eyebrows and the lips. As a result, a hierarchical tracking framework is obtained: we demonstrate that combining two appearance-based trackers allows to get accurate estimates for the eyelid, eyebrows, lips and also the 3D head pose by using low-cost video cameras and in real-time. Therefore, our approach is shown suitable to be used for further facial-expression analysis.Peer Reviewe
    • …
    corecore