70,524 research outputs found

    Latent Dependency Mining for Solving Regression Problems in Computer Vision

    Get PDF
    PhDRegression-based frameworks, learning the direct mapping between low-level imagery features and vector/scalar-formed continuous labels, have been widely exploited in computer vision, e.g. in crowd counting, age estimation and human pose estimation. In the last decade, many efforts have been dedicated by researchers in computer vision for better regression fitting. Nevertheless, solving these computer vision problems with regression frameworks remained a formidable challenge due to 1) feature variation and 2) imbalance and sparse data. On one hand, large feature variation can be caused by the changes of extrinsic conditions (i.e. images are taken under different lighting condition and viewing angles) and also intrinsic conditions (e.g. different aging process of different persons in age estimation and inter-object occlusion in crowd density estimation). On the other hand, imbalanced and sparse data distributions can also have an important effect on regression performance. Apparently, these two challenges existing in regression learning are related in the sense that the feature inconsistency problem is compounded by sparse and imbalanced training data and vice versa, and they need be tackled jointly in modelling and explicitly in representation. This thesis firstly mines an intermediary feature representation consisting of concatenating spatially localised feature for sharing the information from neighbouring localised cells in the frames. This thesis secondly introduces the cumulative attribute concept constructed for learning a regression model by exploiting the latent cumulative dependent nature of label space in regression, in the application of facial age and crowd density estimation. The thesis thirdly demonstrates the effectiveness of a discriminative structured-output regression framework to learn the inherent latent correlation between each element of output variables in the application of 2D human upper body pose estimation. The effectiveness of the proposed regression frameworks for crowd counting, age estimation, and human pose estimation is validated with public benchmarks

    Efficient Spatial Reasoning for Human Pose Estimation

    Get PDF
    Human pose estimation from single images has made significant progress in the past but still faces fundamental challenges from the occlusion and overlapping of joints in many cases. This is partly due to the limitation of the traditional paradigm for this problem, which attempts to locate human body joints solely and as a result can fail to resolve the spatial connections among joints that are critical for the identification of the whole pose. To overcome this shortcoming, we propose to explicitly incorporate spatial reasoning into pose estimation by formulating it as a structured graph learning problem, in which each image pixel is a candidate graph node with every two nodes connected via an edge that captures their affinity. The advantage of this representation is that it allows us to learn feature embeddings for both the nodes and edges, thereby providing a sufficient capacity to delineate correct human body joints and their connecting bones. To facilitate efficient learning and inference, we exploit self-attention transformer architectures that fuse node and edge learning pathways, which can save parameter numbers and permit fast computation. Experiments on the popular MS-COCO Human pose estimation benchmark show that our method outperforms representative methods
    • …
    corecore