17 research outputs found

    Skeleton-aided Articulated Motion Generation

    Full text link
    This work make the first attempt to generate articulated human motion sequence from a single image. On the one hand, we utilize paired inputs including human skeleton information as motion embedding and a single human image as appearance reference, to generate novel motion frames, based on the conditional GAN infrastructure. On the other hand, a triplet loss is employed to pursue appearance-smoothness between consecutive frames. As the proposed framework is capable of jointly exploiting the image appearance space and articulated/kinematic motion space, it generates realistic articulated motion sequence, in contrast to most previous video generation methods which yield blurred motion effects. We test our model on two human action datasets including KTH and Human3.6M, and the proposed framework generates very promising results on both datasets.Comment: ACM MM 201

    Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets

    Get PDF
    We present an example-based approach to pose recovery, using histograms of oriented gradients as image descriptors. Tests on the HumanEva-I and HumanEva-II data sets provide us insight into the strengths and limitations of an example-based approach. We report mean relative 3D errors of approximately 65 mm per joint on HumanEva-I, and 175 mm on HumanEva-II. We discuss our results using single and multiple views. Also, we perform experiments to assess the algorithm’s generalization to unseen subjects, actions and viewpoints. We plan to incorporate the temporal aspect of human motion analysis to reduce orientation ambiguities, and increase the pose recovery accuracy

    Tracking object poses in the context of robust body pose estimates

    Get PDF
    This work focuses on tracking objects being used by humans. These objects are often small, fast moving and heavily occluded by the user. Attempting to recover their 3D position and orientation over time is a challenging research problem. To make progress we appeal to the fact that these objects are often used in a consistent way. The body poses of different people using the same object tend to have similarities, and, when considered relative to those body poses, so do the respective object poses. Our intuition is that, in the context of recent advances in body-pose tracking from RGB-D data, robust object-pose tracking during human-object interactions should also be possible. We propose a combined generative and discriminative tracking framework able to follow gradual changes in object-pose over time but also able to re-initialise object-pose upon recognising distinctive body-poses. The framework is able to predict object-pose relative to a set of independent coordinate systems, each one centred upon a different part of the body. We conduct a quantitative investigation into which body parts serve as the best predictors of object-pose over the course of different interactions. We find that while object-translation should be predicted from nearby body parts, object-rotation can be more robustly predicted by using a much wider range of body parts. Our main contribution is to provide the first object-tracking system able to estimate 3D translation and orientation from RGB-D observations of human-object interactions. By tracking precise changes in object-pose, our method opens up the possibility of more detailed computational reasoning about human-object interactions and their outcomes. For example, in assistive living systems that go beyond just recognising the actions and objects involved in everyday tasks such as sweeping or drinking, to reasoning that a person has missed sweeping under the chair or not drunk enough water today. © 2014 Elsevier B.V. All rights reserved

    Vision-Based Observation Models for Lower Limb 3D Tracking with a Moving Platform

    Get PDF
    Tracking and understanding human gait is an important step towards improving elderly mobility and safety. This thesis presents a vision-based tracking system that estimates the 3D pose of a wheeled walker user's lower limbs with cameras mounted on the moving walker. The tracker estimates 3D poses from images of the lower limbs in the coronal plane in a dynamic, uncontrolled environment. It employs a probabilistic approach based on particle filtering with three different camera setups: a monocular RGB camera, binocular RGB cameras, and a depth camera. For the RGB cameras, observation likelihoods are designed to compare the colors and gradients of each frame with initial templates that are manually extracted. Two strategies are also investigated for handling appearance change of tracking target: increasing number of templates and using different representations of colors. For the depth camera, two observation likelihoods are developed: the first one works directly in the 3D space, while the second one works in the projected image space. Experiments are conducted to evaluate the performance of the tracking system with different users for all three camera setups. It is demonstrated that the trackers with the RGB cameras produce results with higher error as compared to the depth camera, and the strategies for handling appearance change improve tracking accuracy in general. On the other hand, the tracker with the depth sensor successfully tracks the 3D poses of users over the entire video sequence and is robust against unfavorable conditions such as partial occlusion, missing observations, and deformable tracking target

    Acquiring 3D Full-body Motion from Noisy and Ambiguous Input

    Get PDF
    Natural human motion is highly demanded and widely used in a variety of applications such as video games and virtual realities. However, acquisition of full-body motion remains challenging because the system must be capable of accurately capturing a wide variety of human actions and does not require a considerable amount of time and skill to assemble. For instance, commercial optical motion capture systems such as Vicon can capture human motion with high accuracy and resolution while they often require post-processing by experts, which is time-consuming and costly. Microsoft Kinect, despite its high popularity and wide applications, does not provide accurate reconstruction of complex movements when significant occlusions occur. This dissertation explores two different approaches that accurately reconstruct full-body human motion from noisy and ambiguous input data captured by commercial motion capture devices. The first approach automatically generates high-quality human motion from noisy data obtained from commercial optical motion capture systems, eliminating the need for post-processing. The second approach accurately captures a wide variety of human motion even under significant occlusions by using color/depth data captured by a single Kinect camera. The common theme that underlies two approaches is the use of prior knowledge embedded in pre-recorded motion capture database to reduce the reconstruction ambiguity caused by noisy and ambiguous input and constrain the solution to lie in the natural motion space. More specifically, the first approach constructs a series of spatial-temporal filter bases from pre-captured human motion data and employs them along with robust statistics techniques to filter noisy motion data corrupted by noise/outliers. The second approach formulates the problem in a Maximum a Posterior (MAP) framework and generates the most likely pose which explains the observations as well as consistent with the patterns embedded in the pre-recorded motion capture database. We demonstrate the effectiveness of our approaches through extensive numerical evaluations on synthetic data and comparisons against results created by commercial motion capture systems. The first approach can effectively denoise a wide variety of noisy motion data, including walking, running, jumping and swimming while the second approach is shown to be capable of accurately reconstructing a wider range of motions compared with Microsoft Kinect

    Geometric Invariance In The Analysis Of Human Motion In Video Data

    Get PDF
    Human motion analysis is one of the major problems in computer vision research. It deals with the study of the motion of human body in video data from different aspects, ranging from the tracking of body parts and reconstruction of 3D human body configuration, to higher level of interpretation of human action and activities in image sequences. When human motion is observed through video camera, it is perspectively distorted and may appear totally different from different viewpoints. Therefore it is highly challenging to establish correct relationships between human motions across video sequences with different camera settings. In this work, we investigate the geometric invariance in the motion of human body, which is critical to accurately understand human motion in video data regardless of variations in camera parameters and viewpoints. In human action analysis, the representation of human action is a very important issue, and it usually determines the nature of the solutions, including their limits in resolving the problem. Unlike existing research that study human motion as a whole 2D/3D object or a sequence of postures, we study human motion as a sequence of body pose transitions. We also decompose a human body pose further into a number of body point triplets, and break down a pose transition into the transition of a set of body point triplets. In this way the study of complex non-rigid motion of human body is reduced to that of the motion of rigid body point triplets, i.e. a collection of planes in motion. As a result, projective geometry and linear algebra can be applied to explore the geometric invariance in human motion. Based on this formulation, we have discovered the fundamental ratio invariant and the eigenvalue equality invariant in human motion. We also propose solutions based on these geometric invariants to the problems of view-invariant recognition of human postures and actions, as well as analysis of human motion styles. These invariants and their applicability have been validated by experimental results supporting that their effectiveness in understanding human motion with various camera parameters and viewpoints

    Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking

    Get PDF
    Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
    corecore