39,705 research outputs found
Convolutional Neural Network for Accurate Crowd Counting and Destiny Estimation
University of Technology Sydney. Faculty of Engineering and Information Technology.Nowadays, crowd and object counting has become an important task for a variety of applications, such as traffic control, public safety, urban planning, and video surveillance. It has also become a crucial part of building a high-level monitoring system such as video surveillance and crowd analysis. In these cases, dynamic crowd monitoring and analysis is extremely important for control management and social safety.
Like the other computer vision issues, crowd counting and density estimation come with various kinds of challenges such as high clutters, occlusions, non-uniform distributions of objects or people, and intra-scene and inter-scene variations in appearance. Researchers and industrial partners have attempted to design and develop many sophisticated models to address various issues that exist in crowd counting. Especially in recent years, the number of researches in the crowd counting era became overwhelming with the domination of deep-learning and Convolution Neural Networks (CNNs) based models in various computer vision tasks. In this thesis, we revisit the crowd counting and propose various novel solutions to this problem.
At first, we propose an Adaptive Counting Convolutional Neural Network (A-CCNN) and consider the scale variation of objects in a frame adaptively to improve the accuracy of counting. Our method takes advantages of contextual information to provide more accurate and adaptive density maps and crowd counting in a scene. Then, we focus on CNN pruning to further enhance the crowd counting models for real-time application and increase the performance of CCNN model. Thus, a new pruning strategy is proposed by considering the contributions of various filters to the final result. The filters in the original CCNN model are grouped into positive, negative, and irrelevant types. We prune the irrelevant filters, of which feature maps contain little information, and the negative filters determined by a mask learned from the training dataset. Our solution improves the results of the counting model without fine-tuning or retraining the pruned model. Finally, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, which leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilises these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed models in terms of the accuracy of counting as well as generated density maps over the well-known state-of-the-art approaches
Crowd Counting with Decomposed Uncertainty
Research in neural networks in the field of computer vision has achieved
remarkable accuracy for point estimation. However, the uncertainty in the
estimation is rarely addressed. Uncertainty quantification accompanied by point
estimation can lead to a more informed decision, and even improve the
prediction quality. In this work, we focus on uncertainty estimation in the
domain of crowd counting. With increasing occurrences of heavily crowded events
such as political rallies, protests, concerts, etc., automated crowd analysis
is becoming an increasingly crucial task. The stakes can be very high in many
of these real-world applications. We propose a scalable neural network
framework with quantification of decomposed uncertainty using a bootstrap
ensemble. We demonstrate that the proposed uncertainty quantification method
provides additional insight to the crowd counting problem and is simple to
implement. We also show that our proposed method exhibits the state of the art
performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
- …