3,529 research outputs found
Context-aware CNNs for person head detection
Person detection is a key problem for many computer vision tasks. While face
detection has reached maturity, detecting people under a full variation of
camera view-points, human poses, lighting conditions and occlusions is still a
difficult challenge. In this work we focus on detecting human heads in natural
scenes. Starting from the recent local R-CNN object detector, we extend it with
two types of contextual cues. First, we leverage person-scene relations and
propose a Global CNN model trained to predict positions and scales of heads
directly from the full image. Second, we explicitly model pairwise relations
among objects and train a Pairwise CNN model using a structured-output
surrogate loss. The Local, Global and Pairwise models are combined into a joint
CNN framework. To train and test our full model, we introduce a large dataset
composed of 369,846 human heads annotated in 224,740 movie frames. We evaluate
our method and demonstrate improvements of person head detection against
several recent baselines in three datasets. We also show improvements of the
detection speed provided by our model.Comment: To appear in International Conference on Computer Vision (ICCV), 201
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
- …