Search CORE

3,181 research outputs found

Crowd Counting with Decomposed Uncertainty

Author: Oh Min-hwan
Olsen Peder A.
Ramamurthy Karthikeyan Natesan
Publication venue
Publication date: 03/04/2020
Field of study

Research in neural networks in the field of computer vision has achieved remarkable accuracy for point estimation. However, the uncertainty in the estimation is rarely addressed. Uncertainty quantification accompanied by point estimation can lead to a more informed decision, and even improve the prediction quality. In this work, we focus on uncertainty estimation in the domain of crowd counting. With increasing occurrences of heavily crowded events such as political rallies, protests, concerts, etc., automated crowd analysis is becoming an increasingly crucial task. The stakes can be very high in many of these real-world applications. We propose a scalable neural network framework with quantification of decomposed uncertainty using a bootstrap ensemble. We demonstrate that the proposed uncertainty quantification method provides additional insight to the crowd counting problem and is simple to implement. We also show that our proposed method exhibits the state of the art performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Online real-time crowd behavior detection in video sequences

Author: Bloisi Domenico Daniele
Iocchi Luca
Pennisi Andrea
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach

Crossref

Archivio della Ricerca - Università della Basilicata

Archivio della ricerca- Università di Roma La Sapienza

Density-aware person detection and tracking in crowds

Author: Audibert Jean-Yves
Laptev Ivan
Rodriguez Mikel
Sivic Josef
Publication venue: HAL CCSD
Publication date: 06/11/2011
Field of study

International audienceWe address the problem of person detection and tracking in crowded video scenes. While the detection of individual objects has been improved significantly over the recent years, crowd scenes remain particularly challenging for the detection and tracking tasks due to heavy occlusions, high person densities and significant variation in people's appearance. To address these challenges, we propose to leverage information on the global structure of the scene and to resolve all detections jointly. In particular, we explore constraints imposed by the crowd density and formulate person detection as the optimization of a joint energy function combining crowd density estimation and the localization of individual people. We demonstrate how the optimization of such an energy function significantly improves person detection and tracking in crowds. We validate our approach on a challenging video dataset of crowded scenes

Crossref

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks

Author: Deng Yuanlong
Liu Weixiang
Wang Guankun
Wu Zongze
Zhong Xiaopin
Publication venue
Publication date: 13/01/2024
Field of study

As a fundamental computer vision task, crowd counting plays an important role in public safety. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this problem for three reasons: (1) Existing loss functions fail to address sample imbalance in highly dense and complex scenes; (2) Canonical object detectors lack spatial coherence in loss calculation, disregarding the relationship between object location and background region; (3) Most of the head detection datasets are only annotated with the center points, i.e. without bounding boxes. To overcome these issues, we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Additionally, we introduce GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation and comparison. Extensive experimental results demonstrate the superior performance of our MFL across various detectors and datasets, and it can reduce MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work presents a strong foundation for advancing crowd counting methods based on density estimation.Comment: The manuscript is accepted by Multimedia Tools and Application

arXiv.org e-Print Archive

Statistical t+2D subband modelling for crowd counting

Author: Bhowmik Deepayan
Wallace Andrew
Publication venue
Publication date: 01/04/2018
Field of study

Counting people automatically in a crowded scenario is important to assess safety and to determine behaviour in surveillance operations. In this paper we propose a new algorithm using the statistics of the spatio-temporal wavelet subbands. A t+2D lifting based wavelet transform is exploited to generate a motion saliency map which is then used to extract novel parametric statical texture features. We compare our approach to existing crowd counting approaches and show improvement on standard benchmark sequences, demonstrating the robustness of the extracted features

Heriot Watt Pure

Crossref

Stirling Online Research Repository (RIOXX)

Sheffield Hallam University Research Archive

Stirling Online Research Repository

군중 밀도 예측을 위한 네트워크 구조와 훈련방법의 혼잡도 및 크기 인식 설계

Author: 정지엽
Publication venue: 서울대학교 대학원
Publication date: 01/02/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 최진영.This dissertation presents novel deep learning-based crowd density estimation methods considering the crowd congestion and scale of people. Crowd density estimation is one of the important tasks for the intelligent surveillance system. Using the crowd density estimation, the region of interest for public security and safety can be easily indicated. It can also help advanced computer vision algorithms that are computationally expensive, such as pedestrian detection and tracking. After the introduction of deep learning to the crowd density estimation, most researches follow the conventional scheme that uses a convolutional neural network to learn the network to estimate crowd density map with training images. The deep learning-based crowd density estimation researches can consist of two perspectives; network structure perspective and training strategy perspective. In general, researches of network structure perspective propose a novel network structure to extract features to represent crowd well. On the other hand, those of the training strategy perspective propose a novel training methodology or a loss function to improve the counting performance. In this dissertation, I propose several works in both perspectives in deep learning-based crowd density estimation. In particular, I design the network models to be had rich crowd representation characteristics according to the crowd congestion and the scale of people. I propose two novel network structures: selective ensemble network and cascade residual dilated network. Also, I propose one novel loss function for the crowd density estimation: congestion-aware Bayesian loss. First, I propose a selective ensemble deep network architecture for crowd density estimation. In contrast to existing deep network-based methods, the proposed method incorporates two sub-networks for local density estimation: one to learn sparse density regions and one to learn dense density regions. Locally estimated density maps from the two sub-networks are selectively combined in an ensemble fashion using a gating network to estimate an initial crowd density map. The initial density map is refined as a high-resolution map, using another sub-network that draws on contextual information in the image. In training, a novel adaptive loss scheme is applied to resolve ambiguity in the crowded region. The proposed scheme improves both density map accuracy and counting accuracy by adjusting the weighting value between density loss and counting loss according to the degree of crowdness and training epochs. Second, I propose a novel crowd density estimation architecture, which is composed of multiple dilated convolutional neural network blocks with different scales. The proposed architecture is motivated by an empirical analysis that small-scale dilated convolution well estimates the center area density of each person, whereas large-scale dilated convolution well estimates the periphery area density of a person. To estimate the crowd density map gradually from the center to the periphery of each person in a crowd, the multiple dilated CNN blocks are trained in cascading from the small dilated CNN block to the large one. Third, I propose a novel congestion-aware Bayesian loss method that considers the person-scale and crowd-sparsity. Deep learning-based crowd density estimation can greatly improve the accuracy of crowd counting. Though a Bayesian loss method resolves the two problems of the need of a hand-crafted ground truth (GT) density and noisy annotations, counting accurately in high-congested scenes remains a challenging issue. In a crowd scene, people's appearances change according to the scale of each individual (i.e., the person-scale). Also, the lower the sparsity of a local region (i.e., the crowd-sparsity), the more difficult it is to estimate the crowd density. I estimate the person-scale based on scene geometry, and I then estimate the crowd-sparsity using the estimated person-scale. The estimated person-scale and crowd-sparsity are utilized in the novel congestion-aware Bayesian loss method to improve the supervising representation of the point annotations. The effectiveness of the proposed density estimators is validated through comparative experiments with state-of-the-art methods on widely-used crowd counting benchmark datasets. The proposed methods are achieved superior performance to the state-of-the-art density estimators on diverse surveillance environments. In addition, for all proposed crowd density estimation methods, the efficiency of each component is verified through several ablation experiments.본 학위논문에서는 군중의 혼잡도와 사람의 크기를 고려한 딥러닝 기반의 새로운 군중 밀도 추정 방법을 제시합니다. 군중 밀도 추정은 지능형 감시 시스템의 중요한 과제들 중 하나입니다. 군중 밀도 추정을 사용하여 공공 보안 및 안전에 대한 관심 영역을 쉽게 표시할 수 있습니다. 또한 이를 이용하면 보행자 감지, 추적 등 연산 부담이 높은 고급 컴퓨터 비전 알고리즘이 지능형 감시 시스템에 효과적으로 적용하는 것을 도울 수 있습니다. 군중 밀도 추정에 딥 러닝이 도입된 후 대부분의 연구는 훈련 이미지로 군중 밀도 맵을 추정하는 네트워크를 학습하기 위해 컨볼루션 신경망을 사용하는 관습적인 방식을 따릅니다. 딥 러닝 기반 군중 밀도 추정 연구는 네트워크 구조 관점과 훈련 전략 관점의 두 가지 관점으로 나뉠 수 있습니다. 일반적으로 네트워크 구조 관점의 연구에서는 군중을 잘 표현하기 위한 특징을 추출하기 위한 새로운 네트워크 구조를 제안합니다. 반면 훈련 전략 관점에서는 계수 성능을 향상시키기 위해 새로운 훈련 방법론이나 손실 함수를 제안합니다. 본 학위논문에서는 딥러닝 기반 군중밀도 추정에서 두 가지 관점에서 여러 연구를 제안합니다. 특히, 각 사람의 군중 혼잡도와 규모에 따라 풍부한 군중 표현 특성을 갖도록 제안하는 모델을 설계합니다. 선택적 앙상블 네트워크와 계단식 잔여 확장 네트워크의 두 가지 새로운 네트워크 구조를 제안합니다. 또한 군중 밀도 추정을 위한 새로운 손실 함수인 혼잡 인식 베이지안 손실을 제안합니다. 먼저, 정확한 군중밀도 추정과 인원 계수를 위한 선택적 앙상블 딥 네트워크 구조를 제안합니다. 기존 딥 네트워크 기반 방법과 달리 제안된 방법은 지역 밀도 추정을 위해 두 개의 하위 네트워크를 통합합니다. 하나는 희소 밀도 영역 학습용이고 다른 하나는 밀집 밀도 영역 학습용입니다. 두 개의 하위 네트워크에서 지역적으로 추정된 밀도맵은 초기 군중밀도로 추정되며 게이팅 네트워크를 사용하여 앙상블 방식으로 선택적으로 결합됩니다. 초기 밀도맵은 이미지의 컨텍스트 정보를 기반으로 하는 또 다른 하위 네트워크를 사용하여 고해상도 맵으로 개선됩니다. 네트워크 훈련에서 새로운 적응형 손실 체계를 적용하여 혼잡한 지역의 모호성을 해결합니다. 제안된 기법은 밀집도 및 훈련 정도에 따라 밀도 손실과 계수 손실 사이의 가중치를 조정하여 밀도맵 정확도와 계수 정확도를 모두 향상시킵니다. 두 번째로, 스케일이 다른 다중 확장 컨볼루션 블록으로 구성된 새로운 군중밀도 추정 네트워크 구조를 제안합니다. 제안된 네트워크 구조는 소규모 확장 컨볼루션은 각 사람의 중심 영역 밀도를 정확히 추정하는 반면 대규모 확장 컨볼루션은 사람의 주변 영역 밀도를 잘 추정한다는 경험적 분석에서 비롯되었습니다. 군중에 있는 각 사람의 중심에서 주변으로 점차적으로 군중밀도맵을 추정하기 위해 여러 확장된 컨볼루션 블록이 작은 확장 컨볼루션 블록에서 큰 블록으로 계단식으로 훈련됩니다. 마지막으로, 사람 규모와 군중 희소성을 고려한 새로운 혼잡 인식 베이지안 손실 방법을 제안합니다. 딥 러닝 기반 군중 밀도 추정은 군중 계산의 정확도를 크게 향상시킬 수 있습니다. 베이지안 손실 방법은 손으로 만든 지상 진실 밀도와 잡음이 있는 주석의 필요성이라는 두 가지 문제를 해결하지만 혼잡한 장면에서 정확하게 계산하는 것은 여전히 어려운 문제입니다. 군중 장면에서 사람의 외모는 각 사람의 크기('사람 크기')에 따라 바뀝니다. 또한 국부 영역의 희소성('군중 희소성')이 낮을수록 군중 밀도를 추정하기가 더 어렵습니다. 장면 기하정보를 기반으로 '사람 크기'를 추정한 다음 추정된 '사람 크기'를 사용하여 '군중 희소성'을 추정합니다. 추정된 '사람 크기' 및 '군중 희소성'은 새로운 혼잡 인식 베이지안 손실 방법에서 사용되어 점 주석의 교사 표현을 개선합니다. 제안된 밀도 추정기의 효율성은 널리 사용되는 군중 계산 벤치마크 데이터 세트에 대한 최첨단 방법과의 비교 실험을 통해 검증되었습니다. 제안된 방법은 다양한 감시 환경에서 최첨단 밀도 추정기보다 우수한 성능을 달성했습니다. 또한 제안된 모든 군중 밀도 추정 방법에 대해 여러 자가비교 실험을 통해 각 구성 요소의 효율성을 검증했습니다.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 2 Related Works 4 2.1 Detection-based Approaches 4 2.2 Regression-based Approaches 5 2.3 Deep learning-based Approaches 5 2.3.1 Network Structure Perspective 6 2.3.2 Training Strategy Perspective 7 3 Selective Ensemble Network for Accurate Crowd Density Estimation 9 3.1 Overview 9 3.2 Combining Patch-based and Image-based Approaches 11 3.2.1 Local-Global Cascade Network 14 3.2.2 Experiments 20 3.2.3 Summary 24 3.3 Selective Ensemble Network with Adjustable Counting Loss (SEN-ACL) 25 3.3.1 Overall Scheme 25 3.3.2 Data Description 27 3.3.3 Gating Network 27 3.3.4 Sparse / Dense Network 29 3.3.5 Refinement Network 32 3.4 Experiments 34 3.4.1 Implementation Details 34 3.4.2 Dataset and Evaluation Metrics 35 3.4.3 Self-evaluation on WorldExpo'10 dataset 35 3.4.4 Comparative Evaluation with State of the Art Methods 38 3.4.5 Analysis on the Proposed Components 40 3.5 Summary 40 4 Sequential Crowd Density Estimation from Center to Periphery of Crowd 43 4.1 Overview 43 4.2 Cascade Residual Dilated Network (CRDN) 47 4.2.1 Effects of Dilated Convolution in Crowd Counting 47 4.2.2 The Proposed Network 48 4.3 Experiments 52 4.3.1 Datasets and Experimental Settings 52 4.3.2 Implementation Details 52 4.3.3 Comparison with Other Methods 55 4.3.4 Ablation Study 56 4.3.5 Analysis on the Proposed Components 63 4.4 Conclusion 63 5 Congestion-aware Bayesian Loss for Crowd Counting 64 5.1 Overview 64 5.2 Congestion-aware Bayesian Loss 67 5.2.1 Person-Scale Estimation 67 5.2.2 Crowd-Sparsity Estimation 70 5.2.3 Design of The Proposed Loss 70 5.3 Experiments 74 5.3.1 Datasets 76 5.3.2 Implementation Details 77 5.3.3 Evaluation Metrics 77 5.3.4 Ablation Study 78 5.3.5 Comparisons with State of the Art 80 5.3.6 Differences from Existing Person-scale Inference 87 5.3.7 Analysis on the Proposed Components 88 5.4 Summary 90 6 Conclusion 91 Abstract (In Korean) 105박

SNU Open Repository and Archive

Application-Driven AI Paradigm for Person Counting in Various Scenarios

Author: Hua Minjie
Lian Shiguo
Nan Yibing
Publication venue
Publication date: 23/03/2023
Field of study

Person counting is considered as a fundamental task in video surveillance. However, the scenario diversity in practical applications makes it difficult to exploit a single person counting model for general use. Consequently, engineers must preview the video stream and manually specify an appropriate person counting model based on the scenario of camera shot, which is time-consuming, especially for large-scale deployments. In this paper, we propose a person counting paradigm that utilizes a scenario classifier to automatically select a suitable person counting model for each captured frame. First, the input image is passed through the scenario classifier to obtain a scenario label, which is then used to allocate the frame to one of five fine-tuned models for person counting. Additionally, we present five augmentation datasets collected from different scenarios, including side-view, long-shot, top-view, customized and crowd, which are also integrated to form a scenario classification dataset containing 26323 samples. In our comparative experiments, the proposed paradigm achieves better balance than any single model on the integrated dataset, thus its generalization in various scenarios has been proved

arXiv.org e-Print Archive

Deep learning with self-supervision and uncertainty regularization to count fish in underwater images

Author: Cantor Mauricio
Clapés i Sintes Albert
Escalera Guerrero Sergio
Tarling Penny
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2022
Field of study

Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data

PubMed Central

Diposit Digital de la Universitat de Barcelona

MPG.PuRe

Crowd Counting in Low-Resolution Crowded Scenes Using Region-Based Deep Convolutional Neural Networks

Author: Blumenstein M
Khan SD
Saqib M
Sharma N
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2013 IEEE. Crowd counting and density estimation is an important and challenging problem in the visual analysis of the crowd. Most of the existing approaches use regression on density maps for the crowd count from a single image. However, these methods cannot localize individual pedestrian and therefore cannot estimate the actual distribution of pedestrians in the environment. On the other hand, detection-based methods detect and localize pedestrians in the scene, but the performance of these methods degrades when applied in high-density situations. To overcome the limitations of pedestrian detectors, we proposed a motion-guided filter (MGF) that exploits spatial and temporal information between consecutive frames of the video to recover missed detections. Our framework is based on the deep convolution neural network (DCNN) for crowd counting in the low-to-medium density videos. We employ various state-of-the-art network architectures, namely, Visual Geometry Group (VGG16), Zeiler and Fergus (ZF), and VGGM in the framework of a region-based DCNN for detecting pedestrians. After pedestrian detection, the proposed motion guided filter is employed. We evaluate the performance of our approach on three publicly available datasets. The experimental results demonstrate the effectiveness of our approach, which significantly improves the performance of the state-of-the-art detectors

OPUS - University of Technology Sydney