5,079 research outputs found
Cumulative Attribute Space for Age and Crowd Density Estimation
A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling. 1
Latent Dependency Mining for Solving Regression Problems in Computer Vision
PhDRegression-based frameworks, learning the direct mapping between low-level imagery features
and vector/scalar-formed continuous labels, have been widely exploited in computer vision, e.g.
in crowd counting, age estimation and human pose estimation. In the last decade, many efforts
have been dedicated by researchers in computer vision for better regression fitting. Nevertheless,
solving these computer vision problems with regression frameworks remained a formidable
challenge due to 1) feature variation and 2) imbalance and sparse data. On one hand, large feature
variation can be caused by the changes of extrinsic conditions (i.e. images are taken under
different lighting condition and viewing angles) and also intrinsic conditions (e.g. different aging
process of different persons in age estimation and inter-object occlusion in crowd density
estimation). On the other hand, imbalanced and sparse data distributions can also have an important
effect on regression performance. Apparently, these two challenges existing in regression
learning are related in the sense that the feature inconsistency problem is compounded by sparse
and imbalanced training data and vice versa, and they need be tackled jointly in modelling and
explicitly in representation. This thesis firstly mines an intermediary feature representation consisting
of concatenating spatially localised feature for sharing the information from neighbouring
localised cells in the frames. This thesis secondly introduces the cumulative attribute concept
constructed for learning a regression model by exploiting the latent cumulative dependent nature
of label space in regression, in the application of facial age and crowd density estimation.
The thesis thirdly demonstrates the effectiveness of a discriminative structured-output regression
framework to learn the inherent latent correlation between each element of output variables in
the application of 2D human upper body pose estimation. The effectiveness of the proposed regression
frameworks for crowd counting, age estimation, and human pose estimation is validated
with public benchmarks
- …