6,034 research outputs found
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Visual crowd counting has been recently studied as a way to enable people
counting in crowd scenes from images. Albeit successful, vision-based crowd
counting approaches could fail to capture informative features in extreme
conditions, e.g., imaging at night and occlusion. In this work, we introduce a
novel task of audiovisual crowd counting, in which visual and auditory
information are integrated for counting purposes. We collect a large-scale
benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of
1,935 images and the corresponding audio clips, and 170,270 annotated
instances. In order to fuse the two modalities, we make use of a linear
feature-wise fusion module that carries out an affine transformation on visual
and auditory features. Finally, we conduct extensive experiments using the
proposed dataset and approach. Experimental results show that introducing
auditory information can benefit crowd counting under different illumination,
noise, and occlusion conditions. The dataset and code will be released. Code
and data have been made availabl
Focus for Free in Density-Based Counting
This work considers supervised learning to count from images and their
corresponding point annotations. Where density-based counting methods typically
use the point annotations only to create Gaussian-density maps, which act as
the supervision signal, the starting point of this work is that point
annotations have counting potential beyond density map generation. We introduce
two methods that repurpose the available point annotations to enhance counting
performance. The first is a counting-specific augmentation that leverages point
annotations to simulate occluded objects in both input and density images to
enhance the network's robustness to occlusions. The second method, foreground
distillation, generates foreground masks from the point annotations, from which
we train an auxiliary network on images with blacked-out backgrounds. By doing
so, it learns to extract foreground counting knowledge without interference
from the background. These methods can be seamlessly integrated with existing
counting advances and are adaptable to different loss functions. We demonstrate
complementary effects of the approaches, allowing us to achieve robust counting
results even in challenging scenarios such as background clutter, occlusion,
and varying crowd densities. Our proposed approach achieves strong counting
results on multiple datasets, including ShanghaiTech Part\_A and Part\_B,
UCF\_QNRF, JHU-Crowd++, and NWPU-Crowd.Comment: 18 page
Hybrid Graph Neural Networks for Crowd Counting
Crowd counting is an important yet challenging task due to the large scale
and density variation. Recent investigations have shown that distilling rich
relations among multi-scale features and exploiting useful information from the
auxiliary task, i.e., localization, are vital for this task. Nevertheless, how
to comprehensively leverage these relations within a unified network
architecture is still a challenging problem. In this paper, we present a novel
network structure called Hybrid Graph Neural Network (HyGnn) which targets to
relieve the problem by interweaving the multi-scale features for crowd density
as well as its auxiliary task (localization) together and performing joint
reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to
jointly represent the task-specific feature maps of different scales as nodes,
and two types of relations as edges:(i) multi-scale relations for capturing the
feature dependencies across scales and (ii) mutual beneficial relations
building bridges for the cooperation between counting and localization. Thus,
through message passing, HyGnn can distill rich relations between the nodes to
obtain more powerful representations, leading to robust and accurate results.
Our HyGnn performs significantly well on four challenging datasets:
ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming
the state-of-the-art approaches by a large margin.Comment: To appear in AAAI 202
- …