580 research outputs found
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Visual crowd counting has been recently studied as a way to enable people
counting in crowd scenes from images. Albeit successful, vision-based crowd
counting approaches could fail to capture informative features in extreme
conditions, e.g., imaging at night and occlusion. In this work, we introduce a
novel task of audiovisual crowd counting, in which visual and auditory
information are integrated for counting purposes. We collect a large-scale
benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of
1,935 images and the corresponding audio clips, and 170,270 annotated
instances. In order to fuse the two modalities, we make use of a linear
feature-wise fusion module that carries out an affine transformation on visual
and auditory features. Finally, we conduct extensive experiments using the
proposed dataset and approach. Experimental results show that introducing
auditory information can benefit crowd counting under different illumination,
noise, and occlusion conditions. The dataset and code will be released. Code
and data have been made availabl
A Survey on Computer Vision based Human Analysis in the COVID-19 Era
The emergence of COVID-19 has had a global and profound impact, not only on
society as a whole, but also on the lives of individuals. Various prevention
measures were introduced around the world to limit the transmission of the
disease, including face masks, mandates for social distancing and regular
disinfection in public spaces, and the use of screening applications. These
developments also triggered the need for novel and improved computer vision
techniques capable of (i) providing support to the prevention measures through
an automated analysis of visual data, on the one hand, and (ii) facilitating
normal operation of existing vision-based services, such as biometric
authentication schemes, on the other. Especially important here, are computer
vision techniques that focus on the analysis of people and faces in visual data
and have been affected the most by the partial occlusions introduced by the
mandates for facial masks. Such computer vision based human analysis techniques
include face and face-mask detection approaches, face recognition techniques,
crowd counting solutions, age and expression estimation procedures, models for
detecting face-hand interactions and many others, and have seen considerable
attention over recent years. The goal of this survey is to provide an
introduction to the problems induced by COVID-19 into such research and to
present a comprehensive review of the work done in the computer vision based
human analysis field. Particular attention is paid to the impact of facial
masks on the performance of various methods and recent solutions to mitigate
this problem. Additionally, a detailed review of existing datasets useful for
the development and evaluation of methods for COVID-19 related applications is
also provided. Finally, to help advance the field further, a discussion on the
main open challenges and future research direction is given.Comment: Submitted to Image and Vision Computing, 44 pages, 7 figure
Backdoor Attacks on Crowd Counting
Crowd counting is a regression task that estimates the number of people in a
scene image, which plays a vital role in a range of safety-critical
applications, such as video surveillance, traffic monitoring and flow control.
In this paper, we investigate the vulnerability of deep learning based crowd
counting models to backdoor attacks, a major security threat to deep learning.
A backdoor attack implants a backdoor trigger into a target model via data
poisoning so as to control the model's predictions at test time. Different from
image classification models on which most of existing backdoor attacks have
been developed and tested, crowd counting models are regression models that
output multi-dimensional density maps, thus requiring different techniques to
manipulate.
In this paper, we propose two novel Density Manipulation Backdoor Attacks
(DMBA and DMBA) to attack the model to produce arbitrarily large or
small density estimations. Experimental results demonstrate the effectiveness
of our DMBA attacks on five classic crowd counting models and four types of
datasets. We also provide an in-depth analysis of the unique challenges of
backdooring crowd counting models and reveal two key elements of effective
attacks: 1) full and dense triggers and 2) manipulation of the ground truth
counts or density maps. Our work could help evaluate the vulnerability of crowd
counting models to potential backdoor attacks.Comment: To appear in ACMMM 2022. 10pages, 6 figures and 2 table
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
- …