42 research outputs found
Detecting F-formations as dominant sets
The first step towards analysing social interactive behaviour in crowded environments is to identify who is interacting with whom. This paper presents a new method for detecting focused encounters or F-formations in a crowded, reallife social environment. An F-formation is a specific instance of a group of people who are congregated together with the intent of conversing and exchanging information with each other. We propose a new method of estimating F-formations using a graph clustering algorithm by formulating the problem in terms of identifying dominant sets. A dominant set is a form of maximal clique which occurs in edge weighted graphs. As well as using the proximity between people, body orientation information is used; we propose a socially motivated estimate of focus orientation (SMEFO), which is calculated with location information only. Our experiments show significant improvements in performance over the existing modularity cut algorithm and indicates the effectiveness of using a local social context for detecting F-formations
F2SD: A dataset for end-to-end group detection algorithms
The lack of large-scale datasets has been impeding the advance of deep
learning approaches to the problem of F-formation detection. Moreover, most
research works on this problem rely on input sensor signals of object location
and orientation rather than image signals. To address this, we develop a new,
large-scale dataset of simulated images for F-formation detection, called
F-formation Simulation Dataset (F2SD). F2SD contains nearly 60,000 images
simulated from GTA-5, with bounding boxes and orientation information on
images, making it useful for a wide variety of modelling approaches. It is also
closer to practical scenarios, where three-dimensional location and orientation
information are costly to record. It is challenging to construct such a
large-scale simulated dataset while keeping it realistic. Furthermore, the
available research utilizes conventional methods to detect groups. They do not
detect groups directly from the image. In this work, we propose (1) a
large-scale simulation dataset F2SD and a pipeline for F-formation simulation,
(2) a first-ever end-to-end baseline model for the task, and experiments on our
simulation dataset.Comment: Accepted at ICMV 202
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Who is where? Matching people in video to wearable acceleration during crowded mingling events
ConferenciaWe address the challenging problem of associating acceler-
ation data from a wearable sensor with the corresponding
spatio-temporal region of a person in video during crowded
mingling scenarios. This is an important rst step for multi-
sensor behavior analysis using these two modalities. Clearly,
as the numbers of people in a scene increases, there is also
a need to robustly and automatically associate a region of
the video with each person's device. We propose a hierarchi-
cal association approach which exploits the spatial context
of the scene, outperforming the state-of-the-art approaches
signi cantly. Moreover, we present experiments on match-
ing from 3 to more than 130 acceleration and video streams
which, to our knowledge, is signi cantly larger than prior
works where only up to 5 device streams are associated
F-formation Detection: Individuating Free-standing Conversational Groups in Images
Detection of groups of interacting people is a very interesting and useful
task in many modern technologies, with application fields spanning from
video-surveillance to social robotics. In this paper we first furnish a
rigorous definition of group considering the background of the social sciences:
this allows us to specify many kinds of group, so far neglected in the Computer
Vision literature. On top of this taxonomy, we present a detailed state of the
art on the group detection algorithms. Then, as a main contribution, we present
a brand new method for the automatic detection of groups in still images, which
is based on a graph-cuts framework for clustering individuals; in particular we
are able to codify in a computational sense the sociological definition of
F-formation, that is very useful to encode a group having only proxemic
information: position and orientation of people. We call the proposed method
Graph-Cuts for F-formation (GCFF). We show how GCFF definitely outperforms all
the state of the art methods in terms of different accuracy measures (some of
them are brand new), demonstrating also a strong robustness to noise and
versatility in recognizing groups of various cardinality.Comment: 32 pages, submitted to PLOS On