3,181 research outputs found
Crowd Counting with Decomposed Uncertainty
Research in neural networks in the field of computer vision has achieved
remarkable accuracy for point estimation. However, the uncertainty in the
estimation is rarely addressed. Uncertainty quantification accompanied by point
estimation can lead to a more informed decision, and even improve the
prediction quality. In this work, we focus on uncertainty estimation in the
domain of crowd counting. With increasing occurrences of heavily crowded events
such as political rallies, protests, concerts, etc., automated crowd analysis
is becoming an increasingly crucial task. The stakes can be very high in many
of these real-world applications. We propose a scalable neural network
framework with quantification of decomposed uncertainty using a bootstrap
ensemble. We demonstrate that the proposed uncertainty quantification method
provides additional insight to the crowd counting problem and is simple to
implement. We also show that our proposed method exhibits the state of the art
performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track
Online real-time crowd behavior detection in video sequences
Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach
Density-aware person detection and tracking in crowds
International audienceWe address the problem of person detection and tracking in crowded video scenes. While the detection of individual objects has been improved significantly over the recent years, crowd scenes remain particularly challenging for the detection and tracking tasks due to heavy occlusions, high person densities and significant variation in people's appearance. To address these challenges, we propose to leverage information on the global structure of the scene and to resolve all detections jointly. In particular, we explore constraints imposed by the crowd density and formulate person detection as the optimization of a joint energy function combining crowd density estimation and the localization of individual people. We demonstrate how the optimization of such an energy function significantly improves person detection and tracking in crowds. We validate our approach on a challenging video dataset of crowded scenes
Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks
As a fundamental computer vision task, crowd counting plays an important role
in public safety. Currently, deep learning based head detection is a promising
method for crowd counting. However, the highly concerned object detection
networks cannot be well applied to this problem for three reasons: (1) Existing
loss functions fail to address sample imbalance in highly dense and complex
scenes; (2) Canonical object detectors lack spatial coherence in loss
calculation, disregarding the relationship between object location and
background region; (3) Most of the head detection datasets are only annotated
with the center points, i.e. without bounding boxes. To overcome these issues,
we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian
kernel. MFL provides a unifying framework for the loss functions based on both
heatmap and binary feature map ground truths. Additionally, we introduce
GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation
and comparison. Extensive experimental results demonstrate the superior
performance of our MFL across various detectors and datasets, and it can reduce
MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work
presents a strong foundation for advancing crowd counting methods based on
density estimation.Comment: The manuscript is accepted by Multimedia Tools and Application
Statistical t+2D subband modelling for crowd counting
Counting people automatically in a crowded scenario is important to assess safety and to determine behaviour in surveillance operations. In this paper we propose a new algorithm using the statistics of the spatio-temporal wavelet subbands. A t+2D lifting based wavelet transform is exploited to generate a motion saliency map which is then used to extract novel parametric statical texture features. We compare our approach to existing crowd counting approaches and show improvement on standard benchmark sequences, demonstrating the robustness of the extracted features
๊ตฐ์ค ๋ฐ๋ ์์ธก์ ์ํ ๋คํธ์ํฌ ๊ตฌ์กฐ์ ํ๋ จ๋ฐฉ๋ฒ์ ํผ์ก๋ ๋ฐ ํฌ๊ธฐ ์ธ์ ์ค๊ณ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2022.2. ์ต์ง์.This dissertation presents novel deep learning-based crowd density estimation methods considering the crowd congestion and scale of people. Crowd density estimation is one of the important tasks for the intelligent surveillance system. Using the crowd density estimation, the region of interest for public security and safety can be easily indicated. It can also help advanced computer vision algorithms that are computationally expensive, such as pedestrian detection and tracking.
After the introduction of deep learning to the crowd density estimation, most researches follow the conventional scheme that uses a convolutional neural network to learn the network to estimate crowd density map with training images. The deep learning-based crowd density estimation researches can consist of two perspectives; network structure perspective and training strategy perspective. In general, researches of network structure perspective propose a novel network structure to extract features to represent crowd well. On the other hand, those of the training strategy perspective propose a novel training methodology or a loss function to improve the counting performance.
In this dissertation, I propose several works in both perspectives in deep learning-based crowd density estimation. In particular, I design the network models to be had rich crowd representation characteristics according to the crowd congestion and the scale of people. I propose two novel network structures: selective ensemble network and cascade residual dilated network. Also, I propose one novel loss function for the crowd density estimation: congestion-aware Bayesian loss.
First, I propose a selective ensemble deep network architecture for crowd density estimation. In contrast to existing deep network-based methods, the proposed method incorporates two sub-networks for local density estimation: one to learn sparse density regions and one to learn dense density regions. Locally estimated density maps from the two sub-networks are selectively combined in an ensemble fashion using a gating network to estimate an initial crowd density map. The initial density map is refined as a high-resolution map, using another sub-network that draws on contextual information in the image. In training, a novel adaptive loss scheme is applied to resolve ambiguity in the crowded region. The proposed scheme improves both density map accuracy and counting accuracy by adjusting the weighting value between density loss and counting loss according to the degree of crowdness and training epochs.
Second, I propose a novel crowd density estimation architecture, which is composed of multiple dilated convolutional neural network blocks with different scales. The proposed architecture is motivated by an empirical analysis that small-scale dilated convolution well estimates the center area density of each person, whereas large-scale dilated convolution well estimates the periphery area density of a person. To estimate the crowd density map gradually from the center to the periphery of each person in a crowd, the multiple dilated CNN blocks are trained in cascading from the small dilated CNN block to the large one.
Third, I propose a novel congestion-aware Bayesian loss method that considers the person-scale and crowd-sparsity. Deep learning-based crowd density estimation can greatly improve the accuracy of crowd counting. Though a Bayesian loss method resolves the two problems of the need of a hand-crafted ground truth (GT) density and noisy annotations, counting accurately in high-congested scenes remains a challenging issue. In a crowd scene, people's appearances change according to the scale of each individual (i.e., the person-scale). Also, the lower the sparsity of a local region (i.e., the crowd-sparsity), the more difficult it is to estimate the crowd density. I estimate the person-scale based on scene geometry, and I then estimate the crowd-sparsity using the estimated person-scale. The estimated person-scale and crowd-sparsity are utilized in the novel congestion-aware Bayesian loss method to improve the supervising representation of the point annotations.
The effectiveness of the proposed density estimators is validated through comparative experiments with state-of-the-art methods on widely-used crowd counting benchmark datasets. The proposed methods are achieved superior performance to the state-of-the-art density estimators on diverse surveillance environments. In addition, for all proposed crowd density estimation methods, the efficiency of each component is verified through several ablation experiments.๋ณธ ํ์๋
ผ๋ฌธ์์๋ ๊ตฐ์ค์ ํผ์ก๋์ ์ฌ๋์ ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ์ ์๋ก์ด ๊ตฐ์ค ๋ฐ๋ ์ถ์ ๋ฐฉ๋ฒ์ ์ ์ํฉ๋๋ค. ๊ตฐ์ค ๋ฐ๋ ์ถ์ ์ ์ง๋ฅํ ๊ฐ์ ์์คํ
์ ์ค์ํ ๊ณผ์ ๋ค ์ค ํ๋์
๋๋ค. ๊ตฐ์ค ๋ฐ๋ ์ถ์ ์ ์ฌ์ฉํ์ฌ ๊ณต๊ณต ๋ณด์ ๋ฐ ์์ ์ ๋ํ ๊ด์ฌ ์์ญ์ ์ฝ๊ฒ ํ์ํ ์ ์์ต๋๋ค. ๋ํ ์ด๋ฅผ ์ด์ฉํ๋ฉด ๋ณดํ์ ๊ฐ์ง, ์ถ์ ๋ฑ ์ฐ์ฐ ๋ถ๋ด์ด ๋์ ๊ณ ๊ธ ์ปดํจํฐ ๋น์ ์๊ณ ๋ฆฌ์ฆ์ด ์ง๋ฅํ ๊ฐ์ ์์คํ
์ ํจ๊ณผ์ ์ผ๋ก ์ ์ฉํ๋ ๊ฒ์ ๋์ธ ์ ์์ต๋๋ค.
๊ตฐ์ค ๋ฐ๋ ์ถ์ ์ ๋ฅ ๋ฌ๋์ด ๋์
๋ ํ ๋๋ถ๋ถ์ ์ฐ๊ตฌ๋ ํ๋ จ ์ด๋ฏธ์ง๋ก ๊ตฐ์ค ๋ฐ๋ ๋งต์ ์ถ์ ํ๋ ๋คํธ์ํฌ๋ฅผ ํ์ตํ๊ธฐ ์ํด ์ปจ๋ณผ๋ฃจ์
์ ๊ฒฝ๋ง์ ์ฌ์ฉํ๋ ๊ด์ต์ ์ธ ๋ฐฉ์์ ๋ฐ๋ฆ
๋๋ค. ๋ฅ ๋ฌ๋ ๊ธฐ๋ฐ ๊ตฐ์ค ๋ฐ๋ ์ถ์ ์ฐ๊ตฌ๋ ๋คํธ์ํฌ ๊ตฌ์กฐ ๊ด์ ๊ณผ ํ๋ จ ์ ๋ต ๊ด์ ์ ๋ ๊ฐ์ง ๊ด์ ์ผ๋ก ๋๋ ์ ์์ต๋๋ค. ์ผ๋ฐ์ ์ผ๋ก ๋คํธ์ํฌ ๊ตฌ์กฐ ๊ด์ ์ ์ฐ๊ตฌ์์๋ ๊ตฐ์ค์ ์ ํํํ๊ธฐ ์ํ ํน์ง์ ์ถ์ถํ๊ธฐ ์ํ ์๋ก์ด ๋คํธ์ํฌ ๊ตฌ์กฐ๋ฅผ ์ ์ํฉ๋๋ค. ๋ฐ๋ฉด ํ๋ จ ์ ๋ต ๊ด์ ์์๋ ๊ณ์ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํด ์๋ก์ด ํ๋ จ ๋ฐฉ๋ฒ๋ก ์ด๋ ์์ค ํจ์๋ฅผ ์ ์ํฉ๋๋ค.
๋ณธ ํ์๋
ผ๋ฌธ์์๋ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ๊ตฐ์ค๋ฐ๋ ์ถ์ ์์ ๋ ๊ฐ์ง ๊ด์ ์์ ์ฌ๋ฌ ์ฐ๊ตฌ๋ฅผ ์ ์ํฉ๋๋ค. ํนํ, ๊ฐ ์ฌ๋์ ๊ตฐ์ค ํผ์ก๋์ ๊ท๋ชจ์ ๋ฐ๋ผ ํ๋ถํ ๊ตฐ์ค ํํ ํน์ฑ์ ๊ฐ๋๋ก ์ ์ํ๋ ๋ชจ๋ธ์ ์ค๊ณํฉ๋๋ค. ์ ํ์ ์์๋ธ ๋คํธ์ํฌ์ ๊ณ๋จ์ ์์ฌ ํ์ฅ ๋คํธ์ํฌ์ ๋ ๊ฐ์ง ์๋ก์ด ๋คํธ์ํฌ ๊ตฌ์กฐ๋ฅผ ์ ์ํฉ๋๋ค. ๋ํ ๊ตฐ์ค ๋ฐ๋ ์ถ์ ์ ์ํ ์๋ก์ด ์์ค ํจ์์ธ ํผ์ก ์ธ์ ๋ฒ ์ด์ง์ ์์ค์ ์ ์ํฉ๋๋ค.
๋จผ์ , ์ ํํ ๊ตฐ์ค๋ฐ๋ ์ถ์ ๊ณผ ์ธ์ ๊ณ์๋ฅผ ์ํ ์ ํ์ ์์๋ธ ๋ฅ ๋คํธ์ํฌ ๊ตฌ์กฐ๋ฅผ ์ ์ํฉ๋๋ค. ๊ธฐ์กด ๋ฅ ๋คํธ์ํฌ ๊ธฐ๋ฐ ๋ฐฉ๋ฒ๊ณผ ๋ฌ๋ฆฌ ์ ์๋ ๋ฐฉ๋ฒ์ ์ง์ญ ๋ฐ๋ ์ถ์ ์ ์ํด ๋ ๊ฐ์ ํ์ ๋คํธ์ํฌ๋ฅผ ํตํฉํฉ๋๋ค. ํ๋๋ ํฌ์ ๋ฐ๋ ์์ญ ํ์ต์ฉ์ด๊ณ ๋ค๋ฅธ ํ๋๋ ๋ฐ์ง ๋ฐ๋ ์์ญ ํ์ต์ฉ์
๋๋ค. ๋ ๊ฐ์ ํ์ ๋คํธ์ํฌ์์ ์ง์ญ์ ์ผ๋ก ์ถ์ ๋ ๋ฐ๋๋งต์ ์ด๊ธฐ ๊ตฐ์ค๋ฐ๋๋ก ์ถ์ ๋๋ฉฐ ๊ฒ์ดํ
๋คํธ์ํฌ๋ฅผ ์ฌ์ฉํ์ฌ ์์๋ธ ๋ฐฉ์์ผ๋ก ์ ํ์ ์ผ๋ก ๊ฒฐํฉ๋ฉ๋๋ค. ์ด๊ธฐ ๋ฐ๋๋งต์ ์ด๋ฏธ์ง์ ์ปจํ
์คํธ ์ ๋ณด๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ๋ ๋ ๋ค๋ฅธ ํ์ ๋คํธ์ํฌ๋ฅผ ์ฌ์ฉํ์ฌ ๊ณ ํด์๋ ๋งต์ผ๋ก ๊ฐ์ ๋ฉ๋๋ค. ๋คํธ์ํฌ ํ๋ จ์์ ์๋ก์ด ์ ์ํ ์์ค ์ฒด๊ณ๋ฅผ ์ ์ฉํ์ฌ ํผ์กํ ์ง์ญ์ ๋ชจํธ์ฑ์ ํด๊ฒฐํฉ๋๋ค. ์ ์๋ ๊ธฐ๋ฒ์ ๋ฐ์ง๋ ๋ฐ ํ๋ จ ์ ๋์ ๋ฐ๋ผ ๋ฐ๋ ์์ค๊ณผ ๊ณ์ ์์ค ์ฌ์ด์ ๊ฐ์ค์น๋ฅผ ์กฐ์ ํ์ฌ ๋ฐ๋๋งต ์ ํ๋์ ๊ณ์ ์ ํ๋๋ฅผ ๋ชจ๋ ํฅ์์ํต๋๋ค.
๋ ๋ฒ์งธ๋ก, ์ค์ผ์ผ์ด ๋ค๋ฅธ ๋ค์ค ํ์ฅ ์ปจ๋ณผ๋ฃจ์
๋ธ๋ก์ผ๋ก ๊ตฌ์ฑ๋ ์๋ก์ด ๊ตฐ์ค๋ฐ๋ ์ถ์ ๋คํธ์ํฌ ๊ตฌ์กฐ๋ฅผ ์ ์ํฉ๋๋ค. ์ ์๋ ๋คํธ์ํฌ ๊ตฌ์กฐ๋ ์๊ท๋ชจ ํ์ฅ ์ปจ๋ณผ๋ฃจ์
์ ๊ฐ ์ฌ๋์ ์ค์ฌ ์์ญ ๋ฐ๋๋ฅผ ์ ํํ ์ถ์ ํ๋ ๋ฐ๋ฉด ๋๊ท๋ชจ ํ์ฅ ์ปจ๋ณผ๋ฃจ์
์ ์ฌ๋์ ์ฃผ๋ณ ์์ญ ๋ฐ๋๋ฅผ ์ ์ถ์ ํ๋ค๋ ๊ฒฝํ์ ๋ถ์์์ ๋น๋กฏ๋์์ต๋๋ค. ๊ตฐ์ค์ ์๋ ๊ฐ ์ฌ๋์ ์ค์ฌ์์ ์ฃผ๋ณ์ผ๋ก ์ ์ฐจ์ ์ผ๋ก ๊ตฐ์ค๋ฐ๋๋งต์ ์ถ์ ํ๊ธฐ ์ํด ์ฌ๋ฌ ํ์ฅ๋ ์ปจ๋ณผ๋ฃจ์
๋ธ๋ก์ด ์์ ํ์ฅ ์ปจ๋ณผ๋ฃจ์
๋ธ๋ก์์ ํฐ ๋ธ๋ก์ผ๋ก ๊ณ๋จ์์ผ๋ก ํ๋ จ๋ฉ๋๋ค.
๋ง์ง๋ง์ผ๋ก, ์ฌ๋ ๊ท๋ชจ์ ๊ตฐ์ค ํฌ์์ฑ์ ๊ณ ๋ คํ ์๋ก์ด ํผ์ก ์ธ์ ๋ฒ ์ด์ง์ ์์ค ๋ฐฉ๋ฒ์ ์ ์ํฉ๋๋ค. ๋ฅ ๋ฌ๋ ๊ธฐ๋ฐ ๊ตฐ์ค ๋ฐ๋ ์ถ์ ์ ๊ตฐ์ค ๊ณ์ฐ์ ์ ํ๋๋ฅผ ํฌ๊ฒ ํฅ์์ํฌ ์ ์์ต๋๋ค. ๋ฒ ์ด์ง์ ์์ค ๋ฐฉ๋ฒ์ ์์ผ๋ก ๋ง๋ ์ง์ ์ง์ค ๋ฐ๋์ ์ก์์ด ์๋ ์ฃผ์์ ํ์์ฑ์ด๋ผ๋ ๋ ๊ฐ์ง ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์ง๋ง ํผ์กํ ์ฅ๋ฉด์์ ์ ํํ๊ฒ ๊ณ์ฐํ๋ ๊ฒ์ ์ฌ์ ํ ์ด๋ ค์ด ๋ฌธ์ ์
๋๋ค. ๊ตฐ์ค ์ฅ๋ฉด์์ ์ฌ๋์ ์ธ๋ชจ๋ ๊ฐ ์ฌ๋์ ํฌ๊ธฐ('์ฌ๋ ํฌ๊ธฐ')์ ๋ฐ๋ผ ๋ฐ๋๋๋ค. ๋ํ ๊ตญ๋ถ ์์ญ์ ํฌ์์ฑ('๊ตฐ์ค ํฌ์์ฑ')์ด ๋ฎ์์๋ก ๊ตฐ์ค ๋ฐ๋๋ฅผ ์ถ์ ํ๊ธฐ๊ฐ ๋ ์ด๋ ต์ต๋๋ค. ์ฅ๋ฉด ๊ธฐํ์ ๋ณด๋ฅผ ๊ธฐ๋ฐ์ผ๋ก '์ฌ๋ ํฌ๊ธฐ'๋ฅผ ์ถ์ ํ ๋ค์ ์ถ์ ๋ '์ฌ๋ ํฌ๊ธฐ'๋ฅผ ์ฌ์ฉํ์ฌ '๊ตฐ์ค ํฌ์์ฑ'์ ์ถ์ ํฉ๋๋ค. ์ถ์ ๋ '์ฌ๋ ํฌ๊ธฐ' ๋ฐ '๊ตฐ์ค ํฌ์์ฑ'์ ์๋ก์ด ํผ์ก ์ธ์ ๋ฒ ์ด์ง์ ์์ค ๋ฐฉ๋ฒ์์ ์ฌ์ฉ๋์ด ์ ์ฃผ์์ ๊ต์ฌ ํํ์ ๊ฐ์ ํฉ๋๋ค.
์ ์๋ ๋ฐ๋ ์ถ์ ๊ธฐ์ ํจ์จ์ฑ์ ๋๋ฆฌ ์ฌ์ฉ๋๋ ๊ตฐ์ค ๊ณ์ฐ ๋ฒค์น๋งํฌ ๋ฐ์ดํฐ ์ธํธ์ ๋ํ ์ต์ฒจ๋จ ๋ฐฉ๋ฒ๊ณผ์ ๋น๊ต ์คํ์ ํตํด ๊ฒ์ฆ๋์์ต๋๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ๋ค์ํ ๊ฐ์ ํ๊ฒฝ์์ ์ต์ฒจ๋จ ๋ฐ๋ ์ถ์ ๊ธฐ๋ณด๋ค ์ฐ์ํ ์ฑ๋ฅ์ ๋ฌ์ฑํ์ต๋๋ค. ๋ํ ์ ์๋ ๋ชจ๋ ๊ตฐ์ค ๋ฐ๋ ์ถ์ ๋ฐฉ๋ฒ์ ๋ํด ์ฌ๋ฌ ์๊ฐ๋น๊ต ์คํ์ ํตํด ๊ฐ ๊ตฌ์ฑ ์์์ ํจ์จ์ฑ์ ๊ฒ์ฆํ์ต๋๋ค.Abstract i
Contents iv
List of Tables vii
List of Figures viii
1 Introduction 1
2 Related Works 4
2.1 Detection-based Approaches 4
2.2 Regression-based Approaches 5
2.3 Deep learning-based Approaches 5
2.3.1 Network Structure Perspective 6
2.3.2 Training Strategy Perspective 7
3 Selective Ensemble Network for Accurate Crowd Density Estimation 9
3.1 Overview 9
3.2 Combining Patch-based and Image-based Approaches 11
3.2.1 Local-Global Cascade Network 14
3.2.2 Experiments 20
3.2.3 Summary 24
3.3 Selective Ensemble Network with Adjustable Counting Loss (SEN-ACL) 25
3.3.1 Overall Scheme 25
3.3.2 Data Description 27
3.3.3 Gating Network 27
3.3.4 Sparse / Dense Network 29
3.3.5 Refinement Network 32
3.4 Experiments 34
3.4.1 Implementation Details 34
3.4.2 Dataset and Evaluation Metrics 35
3.4.3 Self-evaluation on WorldExpo'10 dataset 35
3.4.4 Comparative Evaluation with State of the Art Methods 38
3.4.5 Analysis on the Proposed Components 40
3.5 Summary 40
4 Sequential Crowd Density Estimation from Center to Periphery of Crowd 43
4.1 Overview 43
4.2 Cascade Residual Dilated Network (CRDN) 47
4.2.1 Effects of Dilated Convolution in Crowd Counting 47
4.2.2 The Proposed Network 48
4.3 Experiments 52
4.3.1 Datasets and Experimental Settings 52
4.3.2 Implementation Details 52
4.3.3 Comparison with Other Methods 55
4.3.4 Ablation Study 56
4.3.5 Analysis on the Proposed Components 63
4.4 Conclusion 63
5 Congestion-aware Bayesian Loss for Crowd Counting 64
5.1 Overview 64
5.2 Congestion-aware Bayesian Loss 67
5.2.1 Person-Scale Estimation 67
5.2.2 Crowd-Sparsity Estimation 70
5.2.3 Design of The Proposed Loss 70
5.3 Experiments 74
5.3.1 Datasets 76
5.3.2 Implementation Details 77
5.3.3 Evaluation Metrics 77
5.3.4 Ablation Study 78
5.3.5 Comparisons with State of the Art 80
5.3.6 Differences from Existing Person-scale Inference 87
5.3.7 Analysis on the Proposed Components 88
5.4 Summary 90
6 Conclusion 91
Abstract (In Korean) 105๋ฐ
Application-Driven AI Paradigm for Person Counting in Various Scenarios
Person counting is considered as a fundamental task in video surveillance.
However, the scenario diversity in practical applications makes it difficult to
exploit a single person counting model for general use. Consequently, engineers
must preview the video stream and manually specify an appropriate person
counting model based on the scenario of camera shot, which is time-consuming,
especially for large-scale deployments. In this paper, we propose a person
counting paradigm that utilizes a scenario classifier to automatically select a
suitable person counting model for each captured frame. First, the input image
is passed through the scenario classifier to obtain a scenario label, which is
then used to allocate the frame to one of five fine-tuned models for person
counting. Additionally, we present five augmentation datasets collected from
different scenarios, including side-view, long-shot, top-view, customized and
crowd, which are also integrated to form a scenario classification dataset
containing 26323 samples. In our comparative experiments, the proposed paradigm
achieves better balance than any single model on the integrated dataset, thus
its generalization in various scenarios has been proved
Deep learning with self-supervision and uncertainty regularization to count fish in underwater images
Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data
Crowd Counting in Low-Resolution Crowded Scenes Using Region-Based Deep Convolutional Neural Networks
ยฉ 2013 IEEE. Crowd counting and density estimation is an important and challenging problem in the visual analysis of the crowd. Most of the existing approaches use regression on density maps for the crowd count from a single image. However, these methods cannot localize individual pedestrian and therefore cannot estimate the actual distribution of pedestrians in the environment. On the other hand, detection-based methods detect and localize pedestrians in the scene, but the performance of these methods degrades when applied in high-density situations. To overcome the limitations of pedestrian detectors, we proposed a motion-guided filter (MGF) that exploits spatial and temporal information between consecutive frames of the video to recover missed detections. Our framework is based on the deep convolution neural network (DCNN) for crowd counting in the low-to-medium density videos. We employ various state-of-the-art network architectures, namely, Visual Geometry Group (VGG16), Zeiler and Fergus (ZF), and VGGM in the framework of a region-based DCNN for detecting pedestrians. After pedestrian detection, the proposed motion guided filter is employed. We evaluate the performance of our approach on three publicly available datasets. The experimental results demonstrate the effectiveness of our approach, which significantly improves the performance of the state-of-the-art detectors
- โฆ