72 research outputs found

    Single View Human Pose Tracking

    Get PDF
    Recovery of human pose from videos has become a highly active research area in the last decade because of many attractive potential applications, such as surveillance, non-intrusive motion analysis and natural human machine interaction. Video based full body pose estimation is a very challenging task, because of the high degree of articulation of the human body, the large variety of possible human motions, and the diversity of human appearances. Methods for tackling this problem can be roughly categorized as either discriminative or generative. Discriminative methods can work on single images, and are able to recover the human poses efficiently. However, the accuracy and generality largely depend on the training data. Generative approaches usually formulate the problem as a tracking problem and adopt an explicit human model. Although arbitrary motions can be tracked, such systems usually have difficulties in adapting to different subjects and in dealing with tracking failures. In this thesis, an accurate, efficient and robust human pose tracking system from a single view camera is developed, mainly following a generative approach. A novel discriminative feature is also proposed and integrated into the tracking framework to improve the tracking performance. The human pose tracking system is proposed within a particle filtering framework. A reconfigurable skeleton model is constructed based on the Acclaim Skeleton File convention. A basic particle filter is first implemented for upper body tracking, which fuses time efficient cues from monocular sequences and achieves real-time tracking for constrained motions. Next, a 3D surface model is added to the skeleton model, and a full body tracking system is developed for more general and complex motions, assuming a stereo camera input. Partitioned sampling is adopted to deal with the high dimensionality problem, and the system is capable of running in near real-time. Multiple visual cues are investigated and compared, including a newly developed explicit depth cue. Based on the comparative analysis of cues, which reveals the importance of depth and good bottom-up features, a novel algorithm for detecting and identifying endpoint body parts from depth images is proposed. Inspired by the shape context concept, this thesis proposes a novel Local Shape Context (LSC) descriptor specifically for describing the shape features of body parts in depth images. This descriptor describes the local shape of different body parts with respect to a given reference point on a human silhouette, and is shown to be effective at detecting and classifying endpoint body parts. A new type of interest point is defined based on the LSC descriptor, and a hierarchical interest point selection algorithm is designed to further conserve computational resources. The detected endpoint body parts are then classified according to learned models based on the LSC feature. The algorithm is tested using a public dataset and achieves good accuracy with a 100Hz processing speed on a standard PC. Finally, the LSC descriptor is improved to be more generalized. Both the endpoint body parts and the limbs are detected simultaneously. The generalized algorithm is integrated into the tracking framework, which provides a very strong cue and enables tracking failure recovery. The skeleton model is also simplified to further increase the system efficiency. To evaluate the system on arbitrary motions quantitatively, a new dataset is designed and collected using a synchronized Kinect sensor and a marker based motion capture system, including 22 different motions from 5 human subjects. The system is capable of tracking full body motions accurately using a simple skeleton-only model in near real-time on a laptop PC before optimization

    Cloud point labelling in optical motion capture systems

    Get PDF
    109 p.This Thesis deals with the task of point labeling involved in the overall workflow of Optical Motion Capture Systems. Human motion capture by optical sensors produces at each frame snapshots of the motion as a cloud of points that need to be labeled in order to carry out ensuing motion analysis. The problem of labeling is tackled as a classification problem, using machine learning techniques as AdaBoost or Genetic Search to train a set of weak classifiers, gathered in turn in an ensemble of partial solvers. The result is used to feed an online algorithm able to provide a marker labeling at a target detection accuracy at a reduced computational cost. On the other hand, in contrast to other approaches the use of misleading temporal correlations has been discarded, strengthening the process against failure due to occasional labeling errors. The effectiveness of the approach is demonstrated on a real dataset obtained from the measurement of gait motion of persons, for which the ground truth labeling has been verified manually. In addition to the above, a broad sight regarding the field of Motion Capture and its optical branch is provided to the reader: description, composition, state of the art and related work. Shall it serve as suitable framework to highlight the importance and ease the understanding of the point labeling
    corecore