14 research outputs found

    Auditory dialog analysis and understanding by generative modelling of interactional dynamics

    Get PDF
    In the last few years, the interest in the analysis of human behavioral schemes has dramatically grown, in particular for the interpretation of the communication modalities called social signals. They represent well defined interaction patterns, possibly unconscious, characterizing different conversational situations and behaviors in general. In this paper, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the (observed) influence model, and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution here is to show the effectiveness of our model when applied on dialogs classification and clustering tasks, considering dialogs between adults and between children and adults, in both flat and arguing discussions, and showing excellent performances also in comparison with state-of-the-art frameworks

    Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

    Full text link
    Stress during public speaking is common and adversely affects performance and self-confidence. Extensive research has been carried out to develop various models to recognize emotional states. However, minimal research has been conducted to detect stress during public speaking in real time using voice analysis. In this context, the current review showed that the application of algorithms was not properly explored and helped identify the main obstacles in creating a suitable testing environment while accounting for current complexities and limitations. In this paper, we present our main idea and propose a stress detection computational algorithmic model that could be integrated into a Virtual Reality (VR) application to create an intelligent virtual audience for improving public speaking skills. The developed model, when integrated with VR, will be able to detect excessive stress in real time by analysing voice features correlated to physiological parameters indicative of stress and help users gradually control excessive stress and improve public speaking performanceComment: 41 pages, 7 figures, 4 table

    Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

    Get PDF
    More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech-the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts. (C) 2011 Elsevier B.V. All rights reserved.Schuller B., Batliner A., Steidl S., Seppi D., ''Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge'', Speech communication, vol. 53, no. 9-10, pp. 1062-1087, November 2011.status: publishe

    System for acquisition, processing and visualization of biophysiological signals and contextual information

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 69-73).If we are to learn the effects of the environment and our day-to-day actions, and choices on our physiology, we must develop systems that will label biophysiological senor data with contextual information. In this thesis I first present an architecture and implementation of FEEL: a system for the acquisition, processing and visualization of biophysiological signals and contextual information. The system comprises a mobile client application (FMC) and a backend server, The mobile client collects contextual information: phone call details, email reading details, calendar entries, and user location at a fixed interval that is transmitted to the backend server. The backend server stores the contextual information and biophysiological signal data that is uploaded by the user, processes the information and provides a novel interface for viewing the combined data. Next, I present the results of a 10-day user study in which users wore Electrodermal Activity (EDA) wrist sensors that measured their autonomic arousal levels. These users were requested to upload the sensor data and annotate it at the end of the day at first, and then after two days. One group of users had access to both the signal and the full contextual information collected by the mobile phone and the other group could only access the bio physiological signal. At the end of the study the users were asked to fill in a System Usability Scale (SUS) questionnaire, a user experience survey and a Toronto-Alexithymia (TAS-20) questionnaire. My results show that the FEEL system enables the users to annotate bio-physiological signals at a greater effectiveness than the current state of the art. Finally, I showed that there is a correlation between a person's ability to determine their own arousal level and their score on the Toronto-alexithymia test: the less alexythymic they were, the better their correlation between the EDA and their self-reported arousal.by Yadid Ayzenberg.S.M

    Dynamic Estimation of Rater Reliability using Multi-Armed Bandits

    Get PDF
    One of the critical success factors for supervised machine learning is the quality of target values, or predictions, associated with training instances. Predictions can be discrete labels (such as a binary variable specifying whether a blog post is positive or negative) or continuous ratings (for instance, how boring a video is on a 10-point scale). In some areas, predictions are readily available, while in others, the eort of human workers has to be involved. For instance, in the task of emotion recognition from speech, a large corpus of speech recordings is usually available, and humans denote which emotions are present in which recordings

    Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis

    Get PDF
    Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development

    La importància de la veu en la comunicació empresarial. Recerca de la veu més influent

    Get PDF
    En aquesta tesi doctoral es descriu inicialment una base teòrica de l'anatomia respiratòria on es detallen els diferents òrgans per on passa l’aire, tant de les vies circulatòries externes (la boca, el nas, la faringe i la laringe)com de les internes (la tràquea, els bronquis i ja finalment elspulmons). Seguidament s’explica la fisiologia respiratòria i ventilatòria on el diafragma n’és el protagonista amb els diferents tipus de ventilació que existeixen: la clavicular, la diafragmàtica, la costo-abdominal i la total, entre d’altres. Se segueix descrivintl’anatomia de la veu.L’aparell fonador humà se centra especialment en la laringe i els plecs vocals. Posteriorment es defineix la fisiologia de la veu: la fonació. En ella es descriuen els mecanismes necessaris per tal que es produeixi la veu i aquí definim clarament la interrelació que tenen els aparells respiratori i fonador. Després es fa una descripció acurada de la veu i de totes les seves qualitats i característiques (la intensitat, el to, el timbre, la imposta, la projecció i l’articulació). A continuació fem una anàlisi acústica i aerodinàmica de la veu i una avaluació exhaustiva de les relacions que hi pugui haver de la mateixa amb el món emocional. En una segona part s'explica i es justifica el procés empíric basat en tres instruments: entrevistes a experts en el tema que ens apropen a copsar les evidències que existeixen sobre el tema, una enquesta a través d’internet sobre els possibles impactes no verbals de la veu en el món empresariali finalment un tercer instrument consistent en una enquesta basada en deu veus considerades modelque es passaa deu persones de cinc franges d'edat,les quals responen un test de Likert estudiant-ne posteriorment els resultats. Es treuen finalment conclusions i es determinen les veus amb més impacte emocional així com de les aplicacions que se’n podrien derivar.Tanquem aquesta tesi on es pot concloure que sí existeix un model de veu que pugui millorar el rendiment comunicatiu empresarial tot evidenciatpel que s’ha trobat en els diferents instruments que s’han fet servir en el present treball, abans esmentats. Ens decantem, segons aquestes evidències, per un discurs amb un tempo força àgil, sensegaire pauses i que inclogui inflexions tonals, amb una prosòdia rica de tonalitats, amb un volum força moderat i que no té influència per part del gènere, tot i que en aquest darrer cas poden haver-hi excepcions lligades al context.En esta tesis doctoral se describe inicialmente una base teórica de la anatomía respiratoria en la cual se detallan los diferentes órganos por donde circula el aire, tanto de las vías circulatorias externas (la boca, la nariz, la faringe y la laringe) como de las internas (la tráquea, los bronquios y ya finalmente los pulmones). Seguidamente se explica la fisiología respiratoria y ventilatoria, dondeel diafragma es el protagonista con los diferentes tipos de ventilación que existen: la clavicular, la diafragmática, la costoabdominal y la total, entre otros. Se sigue describiendo la anatomía de la voz. El aparato fonador humano se centra especialmente en la laringe y los pliegues vocales. Posteriormente se define la fisiología de la voz: la fonación. En ella se describen los mecanismos necesarios para que se produzca la voz y aquí definimos claramente la interrelación que tienen los aparatos respiratorio y fonador. Después se hace una descripción detallada de la voz y de todas sus cualidades y características (la intensidad, el tono, el timbre, la impostación, la proyección y la articulación).Seguimos con un análisis acústico y aerodinámico de la voz y una evaluación exhaustiva de las relaciones que pueda haber de la misma con el mundo emocional. En una segunda parte se explica y se justifica el proceso empírico basado en tres instrumentos: entrevistas a expertos que nos acercan a comprender las evidencias que existen en el tema, una encuesta a través de internet sobre los posibles impactos no verbales de la voz en el mundo empresarial y finalmente un tercer instrumento consistente en una encuesta basada en diez voces consideradas modelo que se pasa a diez personas de cinco años de edad, las cuales responden una prueba de Likert estudiándose posteriormente los resultados. Finalmente se extraen conclusiones y se determinan las voces con mayor impacto emocional así como sus posibles aplicaciones. Cerramos esta tesis concluyendo que sí existe un modelo de voz que puede mejorar el rendimiento comunicativo empresarial, evidenciado a través de los diferentes instrumentos que se han utilizado en el presente trabajo, antes mencionados. Nos decantamos, según estas evidencias, por un discurso con un tempo bastante ágil, sin demasiadas pausas y que incluye inflexiones tonales, con una prosodia rica en tonalidades, con un volumen bastante moderado y sin influencia por parte del género, aunque en este último caso puede tener excepciones ligadas al contexto.This doctoral thesis initially describes a theoretical basis of the respiratory anatomy into the different organs through the air circulates, from the external circulatory pathways (the mouth, the nose, the pharynx and the larynx)and the internal ones (the trachea, the bronchus and finally the lungs). The respiratory and ventilatory physiology are explained below, where the diaphragm is the protagonist, with the different types of ventilation that exist: clavicular, diaphragmatic, cost-abdominal and total, among others. The anatomy of the voice is still described, the human vocal apparatus is especially focused on the larynx and vocal folds. Subsequently we define the physiology of the voice: the phonation. It describes the mechanisms necessary for the voice to occur and here we clearly define the interrelation of the respiratory and speech apparatus. Afterwards a detailed description of the voice and of all its qualities and characteristics (the intensity, the tone, the timbre, the imposition, the projection and the articulation). We proceed with an acoustic and aerodynamic analysis of the voice and an exhaustive evaluation of the relationships that may exist with the emotional world. A second part explains and justifies the empirical process based on three instruments: interviews with experts that bring us closer to the evidence that exists in the subject, an online survey on the possible non-verbal impacts of voice in the world Business and finally a third instrument consisting of a survey based on ten voices considered a model that is passed to ten people of five years of age, who answer to a Likert test and then study the results. Finally conclusions are drawn and the voices with the greatest emotional impact are determined, as well as their possible applications. We close this thesis by concluding that there is a voice model that can improve business communication performance, all evidenced through the different instruments that have been used in the present work, mentioned above. We chose, according to these evidences, for a discourse with a rather agile tempo, without too many pauses and that include tonal inflections, with a prosody rich in tonalities, with a rather moderate volume and without influence on the part of the genre, although in the latter case you may have context-related exceptions
    corecore