10 research outputs found

    Role Recognition in Radio Programs using Social Affiliation Networks and Mixtures of Discrete Distributions: an Approach Inspired by Social Cognition

    Get PDF
    This paper presents an approach for the recognition of the roles played by speakers participating in radio programs. The approach is inspired by social cognition, i.e. by the way humans make sense of people they do not know, and it includes unsupervised speaker clustering performed with Hidden Markov Models, Social Network Analysis and Mixtures of Bernoulli and Multinomial Distributions. The experiments are performed over two corpora of radio programs for a total of around 45 hours of material. The results show that more than 80 percent of the data time can be labeled correctly in terms of role

    Automatic social role recognition and its application in structuring multiparty interactions

    Get PDF
    Automatic processing of multiparty interactions is a research domain with important applications in content browsing, summarization and information retrieval. In recent years, several works have been devoted to find regular patterns which speakers exhibit in a multiparty interaction also known as social roles. Most of the research in literature has generally focused on recognition of scenario specific formal roles. More recently, role coding schemes based on informal social roles have been proposed in literature, defining roles based on the behavior speakers have in the functioning of a small group interaction. Informal social roles represent a flexible classification scheme that can generalize across different scenarios of multiparty interaction. In this thesis, we focus on automatic recognition of informal social roles and exploit the influence of informal social roles on speaker behavior for structuring multiparty interactions. To model speaker behavior, we systematically explore various verbal and non verbal cues extracted from turn taking patterns, vocal expression and linguistic style. The influence of social roles on the behavior cues exhibited by a speaker is modeled using a discriminative approach based on conditional random fields. Experiments performed on several hours of meeting data reveal that classification using conditional random fields improves the role recognition performance. We demonstrate the effectiveness of our approach by evaluating it on previously unseen scenarios of multiparty interaction. Furthermore, we also consider whether formal roles and informal roles can be automatically predicted by the same verbal and nonverbal features. We exploit the influence of social roles on turn taking patterns to improve speaker diarization under distant microphone condition. Our work extends the Hidden Markov model (HMM)- Gaussian mixture model (GMM) speaker diarization system, and is based on jointly estimating both the speaker segmentation and social roles in an audio recording. We modify the minimum duration constraint in HMM-GMM diarization system by using role information to model the expected duration of speaker's turn. We also use social role n-grams as prior information to model speaker interaction patterns. Finally, we demonstrate the application of social roles for the problem of topic segmentation in meetings. We exploit our findings that social roles can dynamically change in conversations and use this information to predict topic changes in meetings. We also present an unsupervised method for topic segmentation which combines social roles and lexical cohesion. Experimental results show that social roles improve performance of both speaker diarization and topic segmentation

    Automatic role recognition

    Get PDF
    The computing community is making significant efforts towards the development of automatic approaches for the analysis of social interactions. The way people interact depends on the context, but there is one aspect that all social interactions seem to have in common: humans behave according to roles. Therefore, recognizing the roles of participants is an essential step towards understanding social interactions and the construction of socially aware computer. This thesis addresses the problem of automatically recognizing roles of participants in multi-party recordings. The objective is to assign to each participant a role. All the proposed approaches use a similar strategy. They all start by segmenting the audio into turns. Those turns are used as basic analysis units. The next step is to extract features accounting for the organization of turns. The more sophisticated approaches extend the features extracted with features from either the prosody or the semantic. Finally, the mapping of people or turns to roles is done using statistical models. The goal of this thesis is to gain a better understanding of role recognition and we will investigate three aspects that can influence the performance of the system: We investigate the impact of modelling the dependency between the roles. We investigate the contribution of different modalities for the effectiveness of role recognition approach. We investigate the effectiveness of the approach for different scenarios. Three models are proposed and tested on three different corpora totalizing more than 90 hours of audio. The first contribution of this thesis is to investigate the combination of turn-taking features and semantic information for role recognition, improving the accuracy of role recognition from a baseline of 46.4% to 67.9% on the AMI meeting corpus. The second contribution is to use features extracted from the prosody to assign roles. The performance of this model is 89.7% on broadcast news and 87.0% on talk-shows. Finally, the third contribution is the development of a model robust to change in the social setting. This model achieved an accuracy of 86.7% on a database composed of a mixture of broadcast news and talk-shows

    Computational modeling of turn-taking dynamics in spoken conversations

    Get PDF
    The study of human interaction dynamics has been at the center for multiple research disciplines in- cluding computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in their decision-making process. Turn-taking is one of the key aspects of conversational dynamics in dyadic conversations and is an integral part of human- human, and human-machine interaction systems. It is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing, and it involves intricate timing. In verbal (e.g., telephone) conversation, the turn transitions are facilitated by inter- and intra- speaker silences and over- laps. In early research of turn-taking in the speech community, the studies include durational aspects of turns, cues for turn yielding intention and lastly designing turn transition modeling for spoken dia- log agents. Compared to the studies of turn transitions very few works have been done for classifying overlap discourse, especially the competitive act of overlaps and function of silences. Given the limitations of the current state-of-the-art, this dissertation focuses on two aspects of con- versational dynamics: 1) design automated computational models for analyzing turn-taking behavior in a dyadic conversation, 2) predict the outcome of the conversations, i.e., observed user satisfaction, using turn-taking descriptors, and later these two aspects are used to design a conversational profile for each speaker using turn-taking behavior and the outcome of the conversations. The analysis, experiments, and evaluation has been done on a large dataset of Italian call-center spoken conversations where customers and agents are engaged in real problem-solving tasks. Towards solving our research goal, the challenges include automatically segmenting and aligning speakers’ channel from the speech signal, identifying and labeling the turn-types and its functional aspects. The task becomes more challenging due to the presence of overlapping speech. To model turn- taking behavior, the intension behind these overlapping turns needed to be considered. However, among all, the most critical question is how to model observed user satisfaction in a dyadic conversation and what properties of turn-taking behavior can be used to represent and predict the outcome. Thus, the computational models for analyzing turn-taking dynamics, in this dissertation includes au- tomatic segmenting and labeling turn types, categorization of competitive vs non-competitive overlaps, silences (e.g., lapse, pauses) and functions of turns in terms of dialog acts. The novel contributions of the work presented here are to 1. design of a fully automated turn segmentation and labeling (e.g., agent vs customer’s turn, lapse within the speaker, and overlap) system. 2. the design of annotation guidelines for segmenting and annotating the speech overlaps with the competitive and non-competitive labels. 3. demonstrate how different channels of information such as acoustic, linguistic, and psycholin- guistic feature sets perform in the classification of competitive vs non-competitive overlaps. 4. study the role of speakers and context (i.e., agents’ and customers’ speech) for conveying the information of competitiveness for each individual feature set and their combinations. 5. investigate the function of long silences towards the information flow in a dyadic conversation. The extracted turn-taking cues is then used to automatically predict the outcome of the conversation, which is modeled from continuous manifestations of emotion. The contributions include 1. modeling the state of the observed user satisfaction in terms of the final emotional manifestation of the customer (i.e., user). 2. analysis and modeling turn-taking properties to display how each turn type influence the user satisfaction. 3. study of how turn-taking behavior changes within each emotional state. Based on the studies conducted in this work, it is demonstrated that turn-taking behavior, specially competitiveness of overlaps, is more than just an organizational tool in daily human interactions. It represents the beneficial information and contains the power to predict the outcome of the conversation in terms of satisfaction vs not-satisfaction. Combining the turn-taking behavior and the outcome of the conversation, the final and resultant goal is to design a conversational profile for each speaker. Such profiled information not only facilitate domain experts but also would be useful to the call center agent in real time. These systems are fully automated and no human intervention is required. The findings are po- tentially relevant to the research of overlapping speech and automatic analysis of human-human and human-machine interactions

    Sociometry based multiparty audio recordings summarization

    Get PDF
    This paper shows how Social Network Analysis, the study of relational data in specific social environments, can be used to summarize multiparty radio news recordings. A social network is extracted from each recording and it is analyzed in order to detect the role of each speaker (e.g. anchorman, guest, etc.). The role is then used as a criterion to select the segments that are more representative of the recording content. The results show that the length of the recordings can be reduced by more than 90 percent while still preserving most of the information about their content

    Algoritmos bio-inspirados para la detección de comunidades dinámicas en redes complejas

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 22-07-202

    Promoting Andean children's learning of science through cultural and digital tools

    Get PDF
    Conference Theme: To see the world and a grain of sand: Learning across levels of space, time, and scaleIn Peru, there is a large achievement gap in rural schools. In order to overcome this problem, the study aims to design environments that enhance science learning through the integration of ICT with cultural artifacts, respecting the Andean culture and empower rural children to pursue lifelong learning. This investigation employs the Cultural-Historical Activity Theory (CHAT) framework, and the Design-Based Research (DBR) methodology using an iterative process of design, implementation and evaluation of the innovative practice.published_or_final_versio
    corecore