3 research outputs found

    Graph-based human pose estimation using neural networks

    Get PDF
    This thesis investigates the problem of human pose estimation (HPE) from unconstrained single two-dimensional (2D) images using Convolutional Neural Networks (CNNs). Recent approaches propose to solve the HPE problem using various forms of CNN models. Some of these methods focus on training deeper and more computationally expensive CNN structures to classify images of people without any prior knowledge of their poses. Other approaches incorporate an existing prior knowledge of human anatomy and train the CNNs to construct graph-representations of the human pose. These approaches are generally characterised as having lower computational and data requirements. This thesis investigates HPE methods based on the latter approach. In the search for the most accurate and computationally efficient HPE, it explores and compares three types of graph-based pose representations: tree-based, non-tree based, and a hybrid approach combiningbothrepresentations. Thethesiscontributionsarethree-fold. Firstly,theeffectofdifferent CNN structures on the HPE was analysed. New, more efficient network configurations were proposed and tested against the benchmark methods. The proposed configurations achieved offered computational simplicity while maintaining relatively high-performance. Secondly, new data-driven tree-based models were proposed as a modified form of the Chow-Liu Recursive Grouping (CLRG) algorithm. These models were applied within the CNN-based HPE framework showing higher performance compared to the traditional anatomy-based tree-based models. Experiments with different numbers and configurations of tree nodes allowed the determination of a very efficient tree-based configuration consisting of 50 nodes. This configuration achieved higher HPE accuracy compared to the previously proposed 26-node tree. Apart from tree-based models of human pose, efficient non-tree-based models with iterative (looping) connections between nodes were also investigated. The third contribution of this thesis is a novel hybrid HPE framework that combines both tree-based and non-tree-based human pose representations. Experimental results have shown that the hybrid approach leads to higher accuracy compared to either tree-based,or non-tree-based structures individually

    Human parsing with a cascade of hierarchical poselet based pruners

    No full text

    Structure prediction for human parsing

    Get PDF
    This thesis shows that structure prediction is well-suited for detecting and parsing people in images (and videos) due to the advantage of learning local part appearance models jointly with relationships between body parts. In detecting people, this method can deal with hard cases, for example, a person mounting a bicycle, that are uncommon in the training data and can cause current person detectors to fail. This thesis demonstrates a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset, then presents features based on that configuration to an SVM classifier. This thesis shows, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. This thesis shows quantitative evidence that a full relational model of the body performs better at upper body parsing than the standard tree model, despite the need to adopt approximate inference and learning procedures. The method uses an approximate search for inference, and an approximate structure learning method to learn. This thesis compares this method to state of the art methods on a dataset prepared at UIUC (which depicts a wide range of poses), on the standard Buffy dataset, and on the reduced PASCAL dataset published recently. Results suggest that the Buffy dataset over emphasizes poses where the arms hang down, and that leads to generalization problems. Despite the superior performance of a full relational model to a tree structure model, its practical use is still limited because it must deal with the high complexity in inference. This thesis shows a method to boost a parser with poselet pruners. The method first develops a cascade of hierarchical poselet pruners to prune the search space to a small set of part states and then builds a hierarchical poselet parser to find part locations on the pruned set. Experiments on the UIUC Sport dataset shows that the poselet pruners can effectively prune away more than 99.6\% of unlikely part states to about 500 states per part. This small set of part states allows the use of advanced appearance models for better parsers. The method achieves performance comparable to state-of-the-art methods' while improves the speed of finding part locations several times
    corecore