2,232 research outputs found
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
reserved4siWe propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array. After extracting audio-visual cues from individual modalities we fuse them adaptively using their reliability in a particle filter framework. The reliability of the audio signal is measured based on the maximum Global Coherence Field (GCF) peak value at each frame. The visual reliability is based on colour-histogram matching with detection results compared with a reference image in the RGB space. Experiments on the AV16.3 dataset show that the proposed adaptive audio-visual tracker outperforms both the individual modalities and a classical approach with fixed parameters in terms of tracking accuracy.Qian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, AndreaQian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, Andre
An Immersive Telepresence System using RGB-D Sensors and Head Mounted Display
We present a tele-immersive system that enables people to interact with each
other in a virtual world using body gestures in addition to verbal
communication. Beyond the obvious applications, including general online
conversations and gaming, we hypothesize that our proposed system would be
particularly beneficial to education by offering rich visual contents and
interactivity. One distinct feature is the integration of egocentric pose
recognition that allows participants to use their gestures to demonstrate and
manipulate virtual objects simultaneously. This functionality enables the
instructor to ef- fectively and efficiently explain and illustrate complex
concepts or sophisticated problems in an intuitive manner. The highly
interactive and flexible environment can capture and sustain more student
attention than the traditional classroom setting and, thus, delivers a
compelling experience to the students. Our main focus here is to investigate
possible solutions for the system design and implementation and devise
strategies for fast, efficient computation suitable for visual data processing
and network transmission. We describe the technique and experiments in details
and provide quantitative performance results, demonstrating our system can be
run comfortably and reliably for different application scenarios. Our
preliminary results are promising and demonstrate the potential for more
compelling directions in cyberlearning.Comment: IEEE International Symposium on Multimedia 201
AudioâVisual Speaker Tracking
Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multiâspeaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in realâtime tracking and localization of speakers. However, speaker tracking is a challenging task in realâlife scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multiâmodal information, as it conveys complementary information about the state of the speakers compared to singleâmodal tracking. To use multiâmodal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a stateâofâtheâart overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field
F-formation Detection: Individuating Free-standing Conversational Groups in Images
Detection of groups of interacting people is a very interesting and useful
task in many modern technologies, with application fields spanning from
video-surveillance to social robotics. In this paper we first furnish a
rigorous definition of group considering the background of the social sciences:
this allows us to specify many kinds of group, so far neglected in the Computer
Vision literature. On top of this taxonomy, we present a detailed state of the
art on the group detection algorithms. Then, as a main contribution, we present
a brand new method for the automatic detection of groups in still images, which
is based on a graph-cuts framework for clustering individuals; in particular we
are able to codify in a computational sense the sociological definition of
F-formation, that is very useful to encode a group having only proxemic
information: position and orientation of people. We call the proposed method
Graph-Cuts for F-formation (GCFF). We show how GCFF definitely outperforms all
the state of the art methods in terms of different accuracy measures (some of
them are brand new), demonstrating also a strong robustness to noise and
versatility in recognizing groups of various cardinality.Comment: 32 pages, submitted to PLOS On
Machine Understanding of Human Behavior
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior
Analysis and enhancement of interpersonal coordination using inertial measurement unit solutions
Die heutigen mobilen Kommunikationstechnologien haben den Umfang der verbalen und textbasierten Kommunikation mit anderen Menschen, sozialen Robotern und kĂŒnstlicher Intelligenz erhöht. Auf der anderen Seite reduzieren diese Technologien die nonverbale und die direkte persönliche Kommunikation, was zu einer gesellschaftlichen Thematik geworden ist, weil die Verringerung der direkten persönlichen Interaktionen eine angemessene Wahrnehmung sozialer und umgebungsbedingter Reizmuster erschweren und die Entwicklung allgemeiner sozialer FĂ€higkeiten bremsen könnte. Wissenschaftler haben aktuell die Bedeutung nonverbaler zwischenmenschlicher AktivitĂ€ten als soziale FĂ€higkeiten untersucht, indem sie menschliche Verhaltensmuster in Zusammenhang mit den jeweilgen neurophysiologischen Aktivierungsmustern analzsiert haben. Solche QuerschnittsansĂ€tze werden auch im Forschungsprojekt der EuropĂ€ischen Union "Socializing sensori-motor contingencies" (socSMCs) verfolgt, das darauf abzielt, die LeistungsfĂ€higkeit sozialer Roboter zu verbessern und Autismus-Spektrumsstörungen (ASD) adĂ€quat zu behandeln. In diesem Zusammenhang ist die Modellierung und das Benchmarking des Sozialverhaltens gesunder Menschen eine Grundlage fĂŒr theorieorientierte und experimentelle Studien zum weiterfĂŒhrenden VerstĂ€ndnis und zur UnterstĂŒtzung interpersoneller Koordination. In diesem Zusammenhang wurden zwei verschiedene empirische Kategorien in AbhĂ€ngigkeit von der Entfernung der Interagierenden zueinander vorgeschlagen: distale vs. proximale Interaktionssettings, da sich die Struktur der beteiligten kognitiven Systeme zwischen den Kategorien Ă€ndert und sich die Ebene der erwachsenden socSMCs verschiebt. Da diese Dissertation im Rahmen des socSMCs-Projekts entstanden ist, wurden Interaktionssettings fĂŒr beide Kategorien (distal und proximal) entwickelt. Zudem wurden Ein-Sensor-Lösungen zur Reduzierung des Messaufwands (und auch der Kosten) entwickelt, um eine Messung ausgesuchter Verhaltensparameter bei einer Vielzahl von Menschen und sozialen Interaktionen zu ermöglichen. ZunĂ€chst wurden Algorithmen fĂŒr eine kopfgetragene TrĂ€gheitsmesseinheit (H-IMU) zur Messung der menschlichen Kinematik als eine Ein-Sensor-Lösung entwickelt. Die Ergebnisse bestĂ€tigten, dass die H-IMU die eigenen Gangparameter unabhĂ€ngig voneinander allein auf Basis der Kopfkinematik messen kann. Zweitens wurdenâals ein distales socSMC-Settingâdie interpersonellen Kopplungen mit einem Bezug auf drei interagierende Merkmale von âĂbereinstimmungâ (engl.: rapport) behandelt: PositivitĂ€t, gegenseitige Aufmerksamkeit und Koordination. Die H-IMUs ĂŒberwachten bestimmte soziale Verhaltensereignisse, die sich auf die Kinematik der Kopforientierung und Oszillation wĂ€hrend des Gehens und Sprechens stĂŒtzen, so dass der Grad der Ăbereinstimmung geschĂ€tzt werden konnte. SchlieĂlich belegten die Ergebnisse einer experimentellen Studie, die zu einer kollaborativen Aufgabe mit der entwickelten IMU-basierten Tablet-Anwendung durchgefĂŒhrt wurde, unterschiedliche Wirkungen verschiedener audio-motorischer Feedbackformen fĂŒr eine UnterstĂŒtzung der interpersonellen Koordination in der Kategorie proximaler sensomotorischer Kontingenzen.
Diese Dissertation hat einen intensiven interdisziplinĂ€ren Charakter: Technologische Anforderungen in den Bereichen der Sensortechnologie und der Softwareentwicklung mussten in direktem Bezug auf vordefinierte verhaltenswissenschaftliche Fragestellungen entwickelt und angewendet bzw. gelöst werdenâund dies in zwei unterschiedlichen DomĂ€nen (distal, proximal). Der gegebene Bezugsrahmen wurde als eine groĂe Herausforderung bei der Entwicklung der beschriebenen Methoden und Settings wahrgenommen. Die vorgeschlagenen IMU-basierten Lösungen könnten dank der weit verbreiteten IMU-basierten mobilen GerĂ€te zukĂŒnftig in verschiedene Anwendungen perspektiv reich integriert werden.Todayâs mobile communication technologies have increased verbal and text-based communication with other humans, social robots and intelligent virtual assistants. On the other hand, the technologies reduce face-to-face communication. This social issue is critical because decreasing direct interactions may cause difficulty in reading social and environmental cues, thereby impeding the development of overall social skills. Recently, scientists have studied the importance of nonverbal interpersonal activities to social skills, by measuring human behavioral and neurophysiological patterns. These interdisciplinary approaches are in line with the European Union research project, âSocializing sensorimotor contingenciesâ (socSMCs), which aims to improve the capability of social robots and properly deal with autism spectrum disorder (ASD). Therefore, modelling and benchmarking healthy humansâ social behavior are fundamental to establish a foundation for research on emergence and enhancement of interpersonal coordination. In this research project, two different experimental settings were categorized depending on interactantsâ distance: distal and proximal settings, where the structure of engaged cognitive systems changes, and the level of socSMCs differs. As a part of the project, this dissertation work referred to this spatial framework. Additionally, single-sensor solutions were developed to reduce costs and efforts in measuring human behaviors, recognizing the social behaviors, and enhancing interpersonal coordination. First of all, algorithms using a head worn inertial measurement unit (H-IMU) were developed to measure human kinematics, as a baseline for social behaviors. The results confirmed that the H-IMU can measure individual gait parameters by analyzing only head kinematics. Secondly, as a distal sensorimotor contingency, interpersonal relationship was considered with respect to a dynamic structure of three interacting components: positivity, mutual attentiveness, and coordination. The H-IMUs monitored the social behavioral events relying on kinematics of the head orientation and oscillation during walk and talk, which can contribute to estimate the level of rapport. Finally, in a new collaborative task with the proposed IMU-based tablet application, results verified effects of different auditory-motor feedbacks on the enhancement of interpersonal coordination in a proximal setting.
This dissertation has an intensive interdisciplinary character: Technological development, in the areas of sensor and software engineering, was required to apply to or solve issues in direct relation to predefined behavioral scientific questions in two different settings (distal and proximal). The given frame served as a reference in the development of the methods and settings in this dissertation. The proposed IMU-based solutions are also promising for various future applications due to widespread wearable devices with IMUs.European Commission/HORIZON2020-FETPROACT-2014/641321/E
- âŠ