8 research outputs found

    3D hand tracking.

    Get PDF
    The hand is often considered as one of the most natural and intuitive interaction modalities for human-to-human interaction. In human-computer interaction (HCI), proper 3D hand tracking is the first step in developing a more intuitive HCI system which can be used in applications such as gesture recognition, virtual object manipulation and gaming. However, accurate 3D hand tracking, remains a challenging problem due to the hand’s deformation, appearance similarity, high inter-finger occlusion and complex articulated motion. Further, 3D hand tracking is also interesting from a theoretical point of view as it deals with three major areas of computer vision- segmentation (of hand), detection (of hand parts), and tracking (of hand). This thesis proposes a region-based skin color detection technique, a model-based and an appearance-based 3D hand tracking techniques to bring the human-computer interaction applications one step closer. All techniques are briefly described below. Skin color provides a powerful cue for complex computer vision applications. Although skin color detection has been an active research area for decades, the mainstream technology is based on individual pixels. This thesis presents a new region-based technique for skin color detection which outperforms the current state-of-the-art pixel-based skin color detection technique on the popular Compaq dataset (Jones & Rehg 2002). The proposed technique achieves 91.17% true positive rate with 13.12% false negative rate on the Compaq dataset tested over approximately 14,000 web images. Hand tracking is not a trivial task as it requires tracking of 27 degreesof- freedom of hand. Hand deformation, self occlusion, appearance similarity and irregular motion are major problems that make 3D hand tracking a very challenging task. This thesis proposes a model-based 3D hand tracking technique, which is improved by using proposed depth-foreground-background ii feature, palm deformation module and context cue. However, the major problem of model-based techniques is, they are computationally expensive. This can be overcome by discriminative techniques as described below. Discriminative techniques (for example random forest) are good for hand part detection, however they fail due to sensor noise and high interfinger occlusion. Additionally, these techniques have difficulties in modelling kinematic or temporal constraints. Although model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This thesis presents a unified framework for 3D hand tracking, using the best of both methodologies, which out performs the current state-of-the-art 3D hand tracking techniques. The proposed 3D hand tracking techniques in this thesis can be used to extract accurate hand movement features and enable complex human machine interaction such as gaming and virtual object manipulation

    Recognize multi-touch gestures by graph modeling and matching

    Get PDF
    International audienceExtract the features for a multi-touch gesture is difficult due to the complex temporal and motion relations between multiple trajectories. In this paper we present a new generic graph model to quantify the shape, temporal and motion information from multi-touch gesture. To make a comparison between graph, we also propose a specific graph matching method based on graph edit distance. Results prove that our graph model can be fruitfully used for multi-touch gesture pattern recognition purpose with the classifier of graph embedding and SVM

    Single and multiple target tracking via hybrid mean shift/particle filter algorithms

    Get PDF
    This thesis is concerned with single and multiple target visual tracking algorithms and their application in the real world. While they are both powerful and general, one of the main challenges of tracking using particle filter-based algorithms is to manage the particle spread. Too wide a spread leads to dispersal of particles onto clutter, but limited spread may lead to difficulty when fast-moving objects and/or high-speed camera motion throw trackers away from their target(s). This thesis addresses the particle spread management problem. Three novel tracking algorithms are presented, each of which combines particle filtering and Kernel Mean Shift methods to produce more robust and accurate tracking. The first single target tracking algorithm, the Structured Octal Kernel Filter (SOK), combines Mean Shift (Comaniciu et al 2003) and Condensation (Isard and Blake 1998a). The spread of the particle set is handled by structurally placing the particles around the object, using eight particles arranged to cover the maximum area. Mean Shift is then applied to each particle to seek the global maxima. In effect, SOK uses intelligent switching between Mean Shift and particle filtering based on a confidence level. Though effective, it requires a threshold to be set and performs a somewhat inflexible search. The second single target tracking algorithm, the Kernel Annealed Mean Shift tracker (KAMS), uses an annealed particle filter (Deutscher et al 2000), but introduces a Mean Shift step to control particle spread. As a result, higher accuracy and robustness are achieved using fewer particles and annealing levels. Finally, KAMS is extended to create a multi-object tracking algorithm (MKAMS) by introducing an interaction filter to handle object collisions and occlusions. All three algorithms are compared experimentally with existing single/multiple object tracking algorithms. The evaluation procedure compares competing algorithms' robustness, accuracy and computational cost using both numerical measures and a novel application of McNemar's statistic. Results are presented on a wide variety of artificial and real image sequences

    Real-time synthetic primate vision

    Get PDF

    Improved robustness and efficiency for automatic visual site monitoring

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 219-228).Knowing who people are, where they are, what they are doing, and how they interact with other people and things is valuable from commercial, security, and space utilization perspectives. Video sensors backed by computer vision algorithms are a natural way to gather this data. Unfortunately, key technical issues persist in extracting features and models that are simultaneously efficient to compute and robust to issues such as adverse lighting conditions, distracting background motions, appearance changes over time, and occlusions. In this thesis, we present a set of techniques and model enhancements to better handle these problems, focusing on contributions in four areas. First, we improve background subtraction so it can better handle temporally irregular dynamic textures. This allows us to achieve a 5.5% drop in false positive rate on the Wallflower waving trees video. Secondly, we adapt the Dalal and Triggs Histogram of Oriented Gradients pedestrian detector to work on large-scale scenes with dense crowds and harsh lighting conditions: challenges which prevent us from easily using a background subtraction solution. These scenes contain hundreds of simultaneously visible people. To make using the algorithm computationally feasible, we have produced a novel implementation that runs on commodity graphics hardware and is up to 76 faster than our CPU-only implementation. We demonstrate the utility of this detector by modeling scene-level activities with a Hierarchical Dirichlet Process.(cont.) Third, we show how one can improve the quality of pedestrian silhouettes for recognizing individual people. We combine general appearance information from a large population of pedestrians with semi-periodic shape information from individual silhouette sequences. Finally, we show how one can combine a variety of detection and tracking techniques to robustly handle a variety of event detection scenarios such as theft and left-luggage detection. We present the only complete set of results on a standardized collection of very challenging videos.by Gerald Edwin Dalley.Ph.D

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Robust real-time tracking in smart camera networks

    Get PDF

    SIMULATING SEISMIC WAVE PROPAGATION IN TWO-DIMENSIONAL MEDIA USING DISCONTINUOUS SPECTRAL ELEMENT METHODS

    Get PDF
    We introduce a discontinuous spectral element method for simulating seismic wave in 2- dimensional elastic media. The methods combine the flexibility of a discontinuous finite element method with the accuracy of a spectral method. The elastodynamic equations are discretized using high-degree of Lagrange interpolants and integration over an element is accomplished based upon the Gauss-Lobatto-Legendre integration rule. This combination of discretization and integration results in a diagonal mass matrix and the use of discontinuous finite element method makes the calculation can be done locally in each element. Thus, the algorithm is simplified drastically. We validated the results of one-dimensional problem by comparing them with finite-difference time-domain method and exact solution. The comparisons show excellent agreement
    corecore