Search CORE

65 research outputs found

A Real-Time Speech Enhancement Front-End for Multi-Talker Reverberated Scenarios

Author: Emanuele Principi
Francesco Piazza
Rudy Rotili
Stefano Squartini
Publication venue: 'IntechOpen'
Publication date: 01/01/2012
Field of study

IntechOpen

Crossref

IRIS UniversitÃ Politecnica delle Marche

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Author: Alameda-Pineda Xavier
Batrinca Ligia
Lanz Oswald
Lepri Bruno
Ricci Elisa
Sebe Nicu
Staiano Jacopo
Subramanian Ramanathan
Publication venue
Publication date: 23/06/2015
Field of study

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

University of Canberra Research Repository

Identifying Dominant People in Meetings from Audio-Visual Sensors

Author: Gatica-Perez Daniel
Hung Hayley
Publication venue
Publication date: 11/02/2010
Field of study

This paper provides an overview of the area of automated dominance estimation in group meetings. We describe research in social psychology and use this to explain the motivations behind suggested automated systems. With the growth in availability of conversational data captured in meeting rooms, it is possible to investigate how multi-sensor data allows us to characterize non-verbal behaviors that contribute towards dominance. We use an overview of our own work to address the challenges and opportunities in this area of research

Infoscience - École polytechnique fédérale de Lausanne

Speech/Non-Speech Detection in Meetings from Automatically Extracted Low Resolution Visual Features

Author: Ba Silèye O.
Hung Hayley
Publication venue: Idiap
Publication date: 01/01/2010
Field of study

In this paper we address the problem of estimating who is speaking from automatically extracted low resolution visual cues from group meetings. Traditionally, the task of speech/non-speech detection or speaker diarization tries to find who speaks and when from audio features only. Recent work has addressed the problem audio-visually but often with less emphasis on the visual component. Due to the high probability of losing the audio stream during video conferences, this work proposes methods for estimating speech using just low resolution visual cues. We carry out experiments to compare how context through the observation of group behaviour and task-oriented activities can help improve estimates of speaking status. We test on 105 minutes of natural meeting data with unconstrained conversations

Infoscience - École polytechnique fédérale de Lausanne

Crossref

ESTIMATING THE DOMINANT PERSON IN MULTI-PARTY CONVERSATIONS USING SPEAKER DIARIZATION STRATEGIES

Author: Friedland Gerald
Gatica-Perez Daniel
Huang Yan
Hung Hayley
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

In this paper, we apply speaker diarization strategies from a single source to the task of estimating the dominant person in a group meeting. Previous work has shown that speaking length is strongly correlated with perceived dominance. Here we investigate this in more depth by considering two dominance tasks where there is full and majority agreement amongst ground-truth annotators. In addition, we investigate how 24 different speed-up and algorithmic strategies, and source types lead to interesting outcomes when applied to dominance estimation. We obtained the best performance of

77\%

using our slowest scheme and a single distant microphone (SDM). Within the top 3 out of 24 performing experiments in both dominance tasks, we show that we can use the furthest SDM, with no prior knowledge of the number of speakers and the fastest diarization scheme, which performs

1.3

times faster than real-time

Infoscience - École polytechnique fédérale de Lausanne

Non-Verbal Communication Analysis in Victim-Offender Mediations

Author: Baró Xavier
Escalera Sergio
Janés Oriol
Ponce-López Víctor
Pérez Marc
Publication venue
Publication date: 19/01/2015
Field of study

In this paper we present a non-invasive ambient intelligence framework for the semi-automatic analysis of non-verbal communication applied to the restorative justice field. In particular, we propose the use of computer vision and social signal processing technologies in real scenarios of Victim-Offender Mediations, applying feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues from the fields of psychology and observational methodology. We test our methodology on data captured in real world Victim-Offender Mediation sessions in Catalonia in collaboration with the regional government. We define the ground truth based on expert opinions when annotating the observed social responses. Using different state-of-the-art binary classification approaches, our system achieves recognition accuracies of 86% when predicting satisfaction, and 79% when predicting both agreement and receptivity. Applying a regression strategy, we obtain a mean deviation for the predictions between 0.5 and 0.7 in the range [1-5] for the computed social signals.Comment: Please, find the supplementary video material at: http://sunai.uoc.edu/~vponcel/video/VOMSessionSample.mp

arXiv.org e-Print Archive

VBN

The Oberta in open access

Improving Speaker Diarization using social role information

Author: Bourlard Hervé
Sapru A.
Yella Sree Harsha
Publication venue
Publication date: 19/05/2014
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Social Network Analysis for Automatic Role Recognition

Author: Favre Sarah
Publication venue: Lausanne, EPFL
Publication date: 27/04/2011
Field of study

The computing community has shown a significant interest for the analysis of social interactions in the last decade. Different aspects of social interactions have been studied such as dominance, emotions, conflicts, etc. However, the recognition of roles has been neglected whereas these are a key aspect of social interactions. In fact, sociologists have shown not only that people play roles each time they interact, but also that roles shape behavior and expectations of interacting participants. The aim of this thesis is to fill this gap by investigating the problem of automatic role recognition in a wide range of interaction settings, including production environments, e.g. news and talk-shows, and spontaneous exchanges, e.g. meetings. The proposed role recognition approach includes two main steps. The first step aims at representing the individuals involved in an interaction with feature vectors accounting for their relationships with others. This step includes three main stages, namely segmentation of audio into turns (i.e. time intervals during which only one person talks), conversion of the sequence of turns into a social network, and use of the social network as a tool to extract features for each person. The second step uses machine learning methods to map the feature vectors into roles. The experiments have been carried out over roughly 90 hours of material. This is not only one of the largest databases ever used in literature on role recognition, but also the only one, to the best of our knowledge, including different interaction settings. In the experiments, the accuracy of the percentage of data correctly labeled in terms of roles is roughly 80% in production environments and 70% in spontaneous exchanges (lexical features have been added in the latter case). The importance of roles has been assessed in an application scenario as well. In particular, the thesis shows that roles help to segment talk-shows into stories, i.e. time intervals during which a single topic is discussed, with satisfactory performance. The main contributions of this thesis are as follows: To the best of our knowledge, this is the first work where social network analysis is applied to automatic analysis of conversation recordings. This thesis provides the first quantitative measure of how much roles constrain conversations, and a large corpus of recordings annotated in terms of roles. The results of this work have been published in one journal paper, and in five conference articles

Infoscience - École polytechnique fédérale de Lausanne