7 research outputs found
Inverse Attention Guided Deep Crowd Counting Network
In this paper, we address the challenging problem of crowd counting in
congested scenes. Specifically, we present Inverse Attention Guided Deep Crowd
Counting Network (IA-DCCN) that efficiently infuses segmentation information
through an inverse attention mechanism into the counting network, resulting in
significant improvements. The proposed method, which is based on VGG-16, is a
single-step training framework and is simple to implement. The use of
segmentation information results in minimal computational overhead and does not
require any additional annotations. We demonstrate the significance of
segmentation guided inverse attention through a detailed analysis and ablation
study. Furthermore, the proposed method is evaluated on three challenging crowd
counting datasets and is shown to achieve significant improvements over several
recent methods.Comment: Accepted at 16th IEEE International Conference on Advanced Video and
Signal-based Surveillance (AVSS) 201
Enhanced Information Fusion Network for Crowd Counting
In recent years, crowd counting, a technique for predicting the number of
people in an image, becomes a challenging task in computer vision. In this
paper, we propose a cross-column feature fusion network to solve the problem of
information redundancy in columns. We introduce the Information Fusion Module
(IFM) which provides a channel for information flow to help different columns
to obtain significant information from another column. Through this channel,
different columns exchange information with each other and extract useful
features from the other column to enhance key information. Hence, there is no
need for columns to pay attention to all areas in the image. Each column can be
responsible for different regions, thereby reducing the burden of each column.
In experiments, the generalizability of our model is more robust and the
results of transferring between different datasets acheive the comparable
results with the state-of-the-art models.Comment: 10 pages, 5 figure
Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method
In this work, we propose a novel crowd counting network that progressively
generates crowd density maps via residual error estimation. The proposed method
uses VGG16 as the backbone network and employs density map generated by the
final layer as a coarse prediction to refine and generate finer density maps in
a progressive fashion using residual learning. Additionally, the residual
learning is guided by an uncertainty-based confidence weighting mechanism that
permits the flow of only high-confidence residuals in the refinement path. The
proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is
evaluated on recent complex datasets, and it achieves significant improvements
in errors.
Furthermore, we introduce a new large scale unconstrained crowd counting
dataset (JHU-CROWD) that is ~2.8 larger than the most recent crowd counting
datasets in terms of the number of images. It contains 4,250 images with 1.11
million annotations. In comparison to existing datasets, the proposed dataset
is collected under a variety of diverse scenarios and environmental conditions.
Specifically, the dataset includes several images with weather-based
degradations and illumination variations in addition to many distractor images,
making it a very challenging dataset. Additionally, the dataset consists of
rich annotations at both image-level and head-level. Several recent methods are
evaluated and compared on this dataset.Comment: Accepted at ICCV 201
Learning to Count in the Crowd from Limited Labeled Data
Recent crowd counting approaches have achieved excellent performance.
However, they are essentially based on fully supervised paradigm and require
large number of annotated samples. Obtaining annotations is an expensive and
labour-intensive process. In this work, we focus on reducing the annotation
efforts by learning to count in the crowd from limited number of labeled
samples while leveraging a large pool of unlabeled data. Specifically, we
propose a Gaussian Process-based iterative learning mechanism that involves
estimation of pseudo-ground truth for the unlabeled data, which is then used as
supervision for training the network. The proposed method is shown to be
effective under the reduced data (semi-supervised) settings for several
datasets like ShanghaiTech, UCF-QNRF, WorldExpo, UCSD, etc. Furthermore, we
demonstrate that the proposed method can be leveraged to enable the network in
learning to count from synthetic dataset while being able to generalize better
to real-world datasets (synthetic-to-real transfer).Comment: Accepted at ECCV 202
Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting
Crowd counting presents enormous challenges in the form of large variation in
scales within images and across the dataset. These issues are further
exacerbated in highly congested scenes. Approaches based on straightforward
fusion of multi-scale features from a deep network seem to be obvious solutions
to this problem. However, these fusion approaches do not yield significant
improvements in the case of crowd counting in congested scenes. This is usually
due to their limited abilities in effectively combining the multi-scale
features for problems like crowd counting. To overcome this, we focus on how to
efficiently leverage information present in different layers of the network.
Specifically, we present a network that involves: (i) a multi-level bottom-top
and top-bottom fusion (MBTTBF) method to combine information from shallower to
deeper layers and vice versa at multiple levels, (ii) scale complementary
feature extraction blocks (SCFB) involving cross-scale residual functions to
explicitly enable flow of complementary features from adjacent conv layers
along the fusion paths. Furthermore, in order to increase the effectiveness of
the multi-scale fusion, we employ a principled way of generating scale-aware
ground-truth density maps for training. Experiments conducted on three datasets
that contain highly congested scenes (ShanghaiTech, UCF_CC_50, and UCF-QNRF)
demonstrate that the proposed method is able to outperform several recent
methods in all the datasets.Comment: Accepted at ICCV 201
Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss
Automatic crowd behaviour analysis is an important task for intelligent
transportation systems to enable effective flow control and dynamic route
planning for varying road participants. Crowd counting is one of the keys to
automatic crowd behaviour analysis. Crowd counting using deep convolutional
neural networks (CNN) has achieved encouraging progress in recent years.
Researchers have devoted much effort to the design of variant CNN architectures
and most of them are based on the pre-trained VGG16 model. Due to the
insufficient expressive capacity, the backbone network of VGG16 is usually
followed by another cumbersome network specially designed for good counting
performance. Although VGG models have been outperformed by Inception models in
image classification tasks, the existing crowd counting networks built with
Inception modules still only have a small number of layers with basic types of
Inception modules. To fill in this gap, in this paper, we firstly benchmark the
baseline Inception-v3 model on commonly used crowd counting datasets and
achieve surprisingly good performance comparable with or better than most
existing crowd counting models. Subsequently, we push the boundary of this
disruptive work further by proposing a Segmentation Guided Attention Network
(SGANet) with Inception-v3 as the backbone and a novel curriculum loss for
crowd counting. We conduct thorough experiments to compare the performance of
our SGANet with prior arts and the proposed model can achieve state-of-the-art
performance with MAE of 57.6, 6.3 and 87.6 on ShanghaiTechA, ShanghaiTechB and
UCF\_QNRF, respectively.Comment: Technical Report, Durham Universit
JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method
Due to its variety of applications in the real-world, the task of single
image-based crowd counting has received a lot of interest in the recent years.
Recently, several approaches have been proposed to address various problems
encountered in crowd counting. These approaches are essentially based on
convolutional neural networks that require large amounts of data to train the
network parameters. Considering this, we introduce a new large scale
unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images
with "1.51 million" annotations. In comparison to existing datasets, the
proposed dataset is collected under a variety of diverse scenarios and
environmental conditions. Specifically, the dataset includes several images
with weather-based degradations and illumination variations, making it a very
challenging dataset. Additionally, the dataset consists of a rich set of
annotations at both image-level and head-level. Several recent methods are
evaluated and compared on this dataset. The dataset can be downloaded from
http://www.crowd-counting.com .
Furthermore, we propose a novel crowd counting network that progressively
generates crowd density maps via residual error estimation. The proposed method
uses VGG16 as the backbone network and employs density map generated by the
final layer as a coarse prediction to refine and generate finer density maps in
a progressive fashion using residual learning. Additionally, the residual
learning is guided by an uncertainty-based confidence weighting mechanism that
permits the flow of only high-confidence residuals in the refinement path. The
proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is
evaluated on recent complex datasets, and it achieves significant improvements
in errors.Comment: Accepted at T-PAMI 2020. The dataset can be downloaded from
http://www.crowd-counting.com. arXiv admin note: substantial text overlap
with arXiv:1910.1238