200 research outputs found

    GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB

    Full text link
    We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage

    Posing 3D Models from Drawing

    Get PDF
    Inferring the 3D pose of a character from a drawing is a complex and under-constrained problem. Solving it may help automate various parts of an animation production pipeline such as pre-visualisation. In this paper, a novel way of inferring the 3D pose from a monocular 2D sketch is proposed. The proposed method does not make any external assumptions about the model, allowing it to be used on different types of characters. The inference of the 3D pose is formulated as an optimisation problem and a parallel variation of the Particle Swarm Optimisation algorithm called PARAC-LOAPSO is utilised for searching the minimum. Testing in isolation as well as part of a larger scene, the presented method is evaluated by posing a lamp, a horse and a human character. The results show that this method is robust, highly scalable and is able to be extended to various types of models

    Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties

    Full text link
    Model-based approaches to 3D hand tracking have been shown to perform well in a wide range of scenarios. However, they require initialisation and cannot recover easily from tracking failures that occur due to fast hand motions. Data-driven approaches, on the other hand, can quickly deliver a solution, but the results often suffer from lower accuracy or missing anatomical validity compared to those obtained from model-based approaches. In this work we propose a hybrid approach for hand pose estimation from a single depth image. First, a learned regressor is employed to deliver multiple initial hypotheses for the 3D position of each hand joint. Subsequently, the kinematic parameters of a 3D hand model are found by deliberately exploiting the inherent uncertainty of the inferred joint proposals. This way, the method provides anatomically valid and accurate solutions without requiring manual initialisation or suffering from track losses. Quantitative results on several standard datasets demonstrate that the proposed method outperforms state-of-the-art representatives of the model-based, data-driven and hybrid paradigms.Comment: BMVC 2015 (oral); see also http://lrs.icg.tugraz.at/research/hybridhape

    Metaheuristic Optimization Techniques for Articulated Human Tracking

    Get PDF
    Four adaptive metaheuristic optimization algorithms are proposed and demonstrated: Adaptive Parameter Particle Swarm Optimization (AP-PSO), Modified Artificial Bat (MAB), Differential Mutated Artificial Immune System (DM-AIS) and hybrid Particle Swarm Accelerated Artificial Immune System (PSO-AIS). The algorithms adapt their search parameters on the basis of the fitness of obtained solutions such that a good fitness value favors local search, while a poor fitness value favors global search. This efficient feedback of the solution quality, imparts excellent global and local search characteristic to the proposed algorithms. The algorithms are tested on the challenging Articulated Human Tracking (AHT) problem whose objective is to infer human pose, expressed in terms of joint angles, from a continuous video stream. The Particle Filter (PF) algorithms, widely applied in generative model based AHT, suffer from the 'curse of dimensionality' and 'degeneracy' challenges. The four proposed algorithms show stable performance throughout the course of numerical experiments. DM-AIS performs best among the proposed algorithms followed in order by PSO-AIS, AP-PSO, and MBA in terms of Most Appropriate Pose (MAP) tracking error. The MAP tracking error of the proposed algorithms is compared with four heuristic approaches: generic PF, Annealed Particle Filter (APF), Partitioned Sampled Annealed Particle Filter (PSAPF) and Hierarchical Particle Swarm Optimization (HPSO). They are found to outperform generic PF with a confidence level of 95%, PSAPF and HPSO with a confidence level of 85%. While DM-AIS and PSO-AIS outperform APF with a confidence level of 80%. Further, it is noted that the proposed algorithms outperform PSAPF and HPSO using a significantly lower number of function evaluations, 2500 versus 7200. The proposed algorithms demonstrate reduced particle requirements, hence improving computational efficiency and helping to alleviate the 'curse of dimensionality'. The adaptive nature of the algorithms is found to guide the whole swarm towards the optimal solution by sharing information and exploring a wider solution space which resolves the 'degeneracy' challenge. Furthermore, the decentralized structure of the algorithms renders them insensitive to accumulation of error and allows them to recover from catastrophic failures due to loss of image data, sudden change in motion pattern or discrete instances of algorithmic failure. The performance enhancements demonstrated by the proposed algorithms, attributed to their balanced local and global search capabilities, makes real-time AHT applications feasible. Finally, the utility of the proposed algorithms in low-dimensional system identification problems as well as high-dimensional AHT problems demonstrates their applicability in various problem domains

    Event Detection in Eye-Tracking Data for Use in Applications with Dynamic Stimuli

    Get PDF
    This doctoral thesis has signal processing of eye-tracking data as its main theme. An eye-tracker is a tool used for estimation of the point where one is looking. Automatic algorithms for classification of different types of eye movements, so called events, form the basis for relating the eye-tracking data to cognitive processes during, e.g., reading a text or watching a movie. The problems with the algorithms available today are that there are few algorithms that can handle detection of events during dynamic stimuli and that there is no standardized procedure for how to evaluate the algorithms. This thesis comprises an introduction and four papers describing methods for detection of the most common types of eye movements in eye-tracking data and strategies for evaluation of such methods. The most common types of eye movements are fixations, saccades, and smooth pursuit movements. In addition to these eye movements, the event post-saccadic oscillations, (PSO), is considered. The eye-tracking data in this thesis are recorded using both high- and low-speed eye-trackers. The first paper presents a method for detection of saccades and PSO. The saccades are detected using the acceleration signal and three specialized criteria based on directional information. In order to detect PSO, the interval after each saccade is modeled and the parameters of the model are used to determine whether PSO are present or not. The algorithm was evaluated by comparing the detection results to manual annotations and to the detection results of the most recent PSO detection algorithm. The results show that the algorithm is in good agreement with annotations, and has better performance than the compared algorithm. In the second paper, a method for separation of fixations and smooth pursuit movements is proposed. In the intervals between the detected saccades/PSO, the algorithm uses different spatial scales of the position signal in order to separate between the two types of eye movements. The algorithm is evaluated by computing five different performance measures, showing both general and detailed aspects of the discrimination performance. The performance of the algorithm is compared to the performance of a velocity and dispersion based algorithm, (I-VDT), to the performance of an algorithm based on principle component analysis, (I-PCA), and to manual annotations by two experts. The results show that the proposed algorithm performs considerably better than the compared algorithms. In the third paper, a method based on eye-tracking signals from both eyes is proposed for improved separation of fixations and smooth pursuit movements. The method utilizes directional clustering of the eye-tracking signals in combination with binary filters taking both temporal and spatial aspects of the eye-tracking signal into account. The performance of the method is evaluated using a novel evaluation strategy based on automatically detected moving objects in the video stimuli. The results show that the use of binocular information for separation of fixations and smooth pursuit movements is advantageous in static stimuli, without impairing the algorithm's ability to detect smooth pursuit movements in video and moving dot stimuli. The three first papers in this thesis are based on eye-tracking signals recorded using a stationary eye-tracker, while the fourth paper uses eye-tracking signals recorded using a mobile eye-tracker. In mobile eye-tracking, the user is allowed to move the head and the body, which affects the recorded data. In the fourth paper, a method for compensation of head movements using an inertial measurement unit, (IMU), combined with an event detector for lower sampling rate data is proposed. The event detection is performed by combining information from the eye-tracking signals with information about objects extracted from the scene video of the mobile eye-tracker. The results show that by introducing head movement compensation and information about detected objects in the scene video in the event detector, improved classification can be achieved. In summary, this thesis proposes an entire methodological framework for robust event detection which performs better than previous methods when analyzing eye-tracking signals recorded during dynamic stimuli, and also provides a methodology for performance evaluation of event detection algorithms

    Human Motion Analysis: From Gait Modeling to Shape Representation and Pose Estimation

    Get PDF
    This dissertation presents a series of fundamental approaches to the human motion analysis from three perspectives, i.e., manifold learning-based gait motion modeling, articulated shape representation and efficient pose estimation. Firstly, a new joint gait-pose manifold (JGPM) learning algorithm is proposed to jointly optimize the gait and pose variables simultaneously. To enhance the representability and flexibility for complex motion modeling, we also propose a multi-layer JGPM that is capable of dealing with a variety of walking styles and various strides. We resort to a topologically-constrained Gaussian process latent variable model (GPLVM) to learn the multi-layer JGPM where two new techniques are introduced to facilitate model learning. First is training data diversification that creates a set of simulated motion data with different strides under limited data. Second is the topology-aware local learning that is to speed up model learning by taking advantage of the local topological structure. We demonstrate the effectiveness of our approach by synthesizing the high-quality motions from the multi-layer model. The experimental results show that the multi-layer JGPM outperforms several existing GPLVM-based models in terms of the overall performance of motion modeling.On the other hand, to achieve efficient human pose estimation from a single depth sensor, we develop a generalized Gaussian kernel correlation (GKC)-based framework which supports not only body shape modeling, but also articulated pose tracking. We first generalize GKC from the univariate Gaussian to the multivariate one and derive a unified GKC function that provides a continuous and differentiable similarity measure between a template and an observation, both of which are represented by a collection of univariate and/or multivariate Gaussian kernels. Then, to facilitate the data matching and accommodate articulated body deformation, we embed a quaternion-based articulated skeleton into a collection of multivariate Gaussians-based template model and develop an articulated GKC (AGKC) which supports subject-specific shape modeling and articulated pose tracking for both the full-body and hand. Our tracking algorithm is simple yet effective and computationally efficient. We evaluate our algorithm on two benchmark depth datasets. The experimental results are promising and competitive when compared with state-of-the-art algorithms.Electrical Engineerin
    corecore