3,694 research outputs found
Online real-time crowd behavior detection in video sequences
Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
Learning Behavioural Context
The original publication is available at www.springerlink.co
- …