12,754 research outputs found
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
Crowd Counting with Decomposed Uncertainty
Research in neural networks in the field of computer vision has achieved
remarkable accuracy for point estimation. However, the uncertainty in the
estimation is rarely addressed. Uncertainty quantification accompanied by point
estimation can lead to a more informed decision, and even improve the
prediction quality. In this work, we focus on uncertainty estimation in the
domain of crowd counting. With increasing occurrences of heavily crowded events
such as political rallies, protests, concerts, etc., automated crowd analysis
is becoming an increasingly crucial task. The stakes can be very high in many
of these real-world applications. We propose a scalable neural network
framework with quantification of decomposed uncertainty using a bootstrap
ensemble. We demonstrate that the proposed uncertainty quantification method
provides additional insight to the crowd counting problem and is simple to
implement. We also show that our proposed method exhibits the state of the art
performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
The Design and Implementation of a Wireless Video Surveillance System.
Internet-enabled cameras pervade daily life, generating a huge amount of data, but most of the video they generate is transmitted over wires and analyzed offline with a human in the loop. The ubiquity of cameras limits the amount of video that can be sent to the cloud, especially on wireless networks where capacity is at a premium. In this paper, we present Vigil, a real-time distributed wireless surveillance system that leverages edge computing to support real-time tracking and surveillance in enterprise campuses, retail stores, and across smart cities. Vigil intelligently partitions video processing between edge computing nodes co-located with cameras and the cloud to save wireless capacity, which can then be dedicated to Wi-Fi hotspots, offsetting their cost. Novel video frame prioritization and traffic scheduling algorithms further optimize Vigil's bandwidth utilization. We have deployed Vigil across three sites in both whitespace and Wi-Fi networks. Depending on the level of activity in the scene, experimental results show that Vigil allows a video surveillance system to support a geographical area of coverage between five and 200 times greater than an approach that simply streams video over the wireless network. For a fixed region of coverage and bandwidth, Vigil outperforms the default equal throughput allocation strategy of Wi-Fi by delivering up to 25% more objects relevant to a user's query
A mask-based approach for the geometric calibration of thermal-infrared cameras
Accurate and efficient thermal-infrared (IR) camera calibration is important for advancing computer vision research within the thermal modality. This paper presents an approach for geometrically calibrating individual and multiple cameras in both the thermal and visible modalities. The proposed technique can be used to correct for lens distortion and to simultaneously reference both visible and thermal-IR cameras to a single coordinate frame. The most popular existing approach for the geometric calibration of thermal cameras uses a printed chessboard heated by a flood lamp and is comparatively inaccurate and difficult to execute. Additionally, software toolkits provided for calibration either are unsuitable for this task or require substantial manual intervention. A new geometric mask with high thermal contrast and not requiring a flood lamp is presented as an alternative calibration pattern. Calibration points on the pattern are then accurately located using a clustering-based algorithm which utilizes the maximally stable extremal region detector. This algorithm is integrated into an automatic end-to-end system for calibrating single or multiple cameras. The evaluation shows that using the proposed mask achieves a mean reprojection error up to 78% lower than that using a heated chessboard. The effectiveness of the approach is further demonstrated by using it to calibrate two multiple-camera multiple-modality setups. Source code and binaries for the developed software are provided on the project Web site
- …