1 research outputs found
Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting
In this paper, we propose two modified neural networks based on dual path
multi-scale fusion networks (SFANet) and SegNet for accurate and efficient
crowd counting. Inspired by SFANet, the first model, which is named M-SFANet,
is attached with atrous spatial pyramid pooling (ASPP) and context-aware module
(CAN). The encoder of M-SFANet is enhanced with ASPP containing parallel atrous
convolutional layers with different sampling rates and hence able to extract
multi-scale features of the target object and incorporate larger context. To
further deal with scale variation throughout an input image, we leverage the
CAN module which adaptively encodes the scales of the contextual information.
The combination yields an effective model for counting in both dense and sparse
crowd scenes. Based on the SFANet decoder structure, M-SFANet's decoder has
dual paths, for density map and attention map generation. The second model is
called M-SegNet, which is produced by replacing the bilinear upsampling in
SFANet with max unpooling that is used in SegNet. This change provides a faster
model while providing competitive counting performance. Designed for high-speed
surveillance applications, M-SegNet has no additional multi-scale-aware module
in order to not increase the complexity. Both models are encoder-decoder based
architectures and are end-to-end trainable. We conduct extensive experiments on
five crowd counting datasets and one vehicle counting dataset to show that
these modifications yield algorithms that could improve state-of-the-art crowd
counting methods. Codes are available at
https://github.com/Pongpisit-Thanasutives/Variations-of-SFANet-for-Crowd-Counting.Comment: Accepted at ICPR 202