    Auditory dialog analysis and understanding by generative modelling of interactional dynamics

    In the last few years, the interest in the analysis of human behavioral schemes has dramatically grown, in particular for the interpretation of the communication modalities called social signals. They represent well defined interaction patterns, possibly unconscious, characterizing different conversational situations and behaviors in general. In this paper, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the (observed) influence model, and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution here is to show the effectiveness of our model when applied on dialogs classification and clustering tasks, considering dialogs between adults and between children and adults, in both flat and arguing discussions, and showing excellent performances also in comparison with state-of-the-art frameworks

    Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

    Stress during public speaking is common and adversely affects performance and self-confidence. Extensive research has been carried out to develop various models to recognize emotional states. However, minimal research has been conducted to detect stress during public speaking in real time using voice analysis. In this context, the current review showed that the application of algorithms was not properly explored and helped identify the main obstacles in creating a suitable testing environment while accounting for current complexities and limitations. In this paper, we present our main idea and propose a stress detection computational algorithmic model that could be integrated into a Virtual Reality (VR) application to create an intelligent virtual audience for improving public speaking skills. The developed model, when integrated with VR, will be able to detect excessive stress in real time by analysing voice features correlated to physiological parameters indicative of stress and help users gradually control excessive stress and improve public speaking performanceComment: 41 pages, 7 figures, 4 table

    Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

    More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech-the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts. (C) 2011 Elsevier B.V. All rights reserved.Schuller B., Batliner A., Steidl S., Seppi D., ''Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge'', Speech communication, vol. 53, no. 9-10, pp. 1062-1087, November 2011.status: publishe

    System for acquisition, processing and visualization of biophysiological signals and contextual information

    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 69-73).If we are to learn the effects of the environment and our day-to-day actions, and choices on our physiology, we must develop systems that will label biophysiological senor data with contextual information. In this thesis I first present an architecture and implementation of FEEL: a system for the acquisition, processing and visualization of biophysiological signals and contextual information. The system comprises a mobile client application (FMC) and a backend server, The mobile client collects contextual information: phone call details, email reading details, calendar entries, and user location at a fixed interval that is transmitted to the backend server. The backend server stores the contextual information and biophysiological signal data that is uploaded by the user, processes the information and provides a novel interface for viewing the combined data. Next, I present the results of a 10-day user study in which users wore Electrodermal Activity (EDA) wrist sensors that measured their autonomic arousal levels. These users were requested to upload the sensor data and annotate it at the end of the day at first, and then after two days. One group of users had access to both the signal and the full contextual information collected by the mobile phone and the other group could only access the bio physiological signal. At the end of the study the users were asked to fill in a System Usability Scale (SUS) questionnaire, a user experience survey and a Toronto-Alexithymia (TAS-20) questionnaire. My results show that the FEEL system enables the users to annotate bio-physiological signals at a greater effectiveness than the current state of the art. Finally, I showed that there is a correlation between a person's ability to determine their own arousal level and their score on the Toronto-alexithymia test: the less alexythymic they were, the better their correlation between the EDA and their self-reported arousal.by Yadid Ayzenberg.S.M

    Dynamic Estimation of Rater Reliability using Multi-Armed Bandits

    One of the critical success factors for supervised machine learning is the quality of target values, or predictions, associated with training instances. Predictions can be discrete labels (such as a binary variable specifying whether a blog post is positive or negative) or continuous ratings (for instance, how boring a video is on a 10-point scale). In some areas, predictions are readily available, while in others, the eort of human workers has to be involved. For instance, in the task of emotion recognition from speech, a large corpus of speech recordings is usually available, and humans denote which emotions are present in which recordings

    Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis

    Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development

    La importància de la veu en la comunicació empresarial. Recerca de la veu més influent

    En aquesta tesi doctoral es descriu inicialment una base teòrica de l'anatomia respiratòria on es detallen els diferents òrgans per on passa l’aire, tant de les vies circulatòries externes (la boca, el nas, la faringe i la laringe)com de les internes (la tràquea, els bronquis i ja finalment elspulmons). Seguidament s’explica la fisiologia respiratòria i ventilatòria on el diafragma n’és el protagonista amb els diferents tipus de ventilació que existeixen: la clavicular, la diafragmàtica, la costo-abdominal i la total, entre d’altres. Se segueix descrivintl’anatomia de la veu.L’aparell fonador humà se centra especialment en la laringe i els plecs vocals. Posteriorment es defineix la fisiologia de la veu: la fonació. En ella es descriuen els mecanismes necessaris per tal que es produeixi la veu i aquí definim clarament la interrelació que tenen els aparells respiratori i fonador. 