7,696 research outputs found
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Visual crowd counting has been recently studied as a way to enable people
counting in crowd scenes from images. Albeit successful, vision-based crowd
counting approaches could fail to capture informative features in extreme
conditions, e.g., imaging at night and occlusion. In this work, we introduce a
novel task of audiovisual crowd counting, in which visual and auditory
information are integrated for counting purposes. We collect a large-scale
benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of
1,935 images and the corresponding audio clips, and 170,270 annotated
instances. In order to fuse the two modalities, we make use of a linear
feature-wise fusion module that carries out an affine transformation on visual
and auditory features. Finally, we conduct extensive experiments using the
proposed dataset and approach. Experimental results show that introducing
auditory information can benefit crowd counting under different illumination,
noise, and occlusion conditions. The dataset and code will be released. Code
and data have been made availabl
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
- …