14 research outputs found
Recommended from our members
Using Context to Improve Emotion Detection in Spoken Dialog Systems
Most research that explores the emotional state of users of spoken dialog systems does not fully utilize the contextual nature that the dialog structure provides. This paper reports results of machine learning experiments designed to automatically classify the emotional state of user turns using a corpus of 5,690 dialogs collected with the "How May I Help You" spoken dialog system. We show that augmenting standard lexical and prosodic features with contextual features that exploit the structure of spoken dialog and track user state increases classification accuracy by 2.6%
Recommended from our members
Computational Approaches to Modeling Speaker State in the Medical Domain
Recently, researchers in computer science and engineering have begun to explore the possibility of finding speech-based correlates of various medical conditions using automatic, computational methods. If such language cues can be identified and quantified automatically, this information can be used to support diagnosis and treatment of medical conditions in clinical settings and to further fundamental research in understanding cognition. This chapter reviews computational approaches that explore communicative patterns of patients who suffer from medical conditions such as depression, autism spectrum disorders, schizophrenia, and cancer. There are two main approaches discussed: research that explores features extracted from the acoustic signal and research that focuses on lexical and semantic features. We also present some applied research that uses computational methods to develop assistive technologies. In the final sections we discuss issues related to and the future of this emerging field of research
Auditory dialog analysis and understanding by generative modelling of interactional dynamics
In the last few years, the interest in the analysis of human behavioral schemes has dramatically grown, in particular for the interpretation of the communication modalities called social signals. They represent well defined interaction patterns, possibly unconscious, characterizing different conversational situations and behaviors in general. In this paper, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the (observed) influence model, and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution here is to show the effectiveness of our model when applied on dialogs classification and clustering tasks, considering dialogs between adults and between children and adults, in both flat and arguing discussions, and showing excellent performances also in comparison with state-of-the-art frameworks
Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review
Stress during public speaking is common and adversely affects performance and
self-confidence. Extensive research has been carried out to develop various
models to recognize emotional states. However, minimal research has been
conducted to detect stress during public speaking in real time using voice
analysis. In this context, the current review showed that the application of
algorithms was not properly explored and helped identify the main obstacles in
creating a suitable testing environment while accounting for current
complexities and limitations. In this paper, we present our main idea and
propose a stress detection computational algorithmic model that could be
integrated into a Virtual Reality (VR) application to create an intelligent
virtual audience for improving public speaking skills. The developed model,
when integrated with VR, will be able to detect excessive stress in real time
by analysing voice features correlated to physiological parameters indicative
of stress and help users gradually control excessive stress and improve public
speaking performanceComment: 41 pages, 7 figures, 4 table
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge
More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech-the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts. (C) 2011 Elsevier B.V. All rights reserved.Schuller B., Batliner A., Steidl S., Seppi D., ''Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge'', Speech communication, vol. 53, no. 9-10, pp. 1062-1087, November 2011.status: publishe
System for acquisition, processing and visualization of biophysiological signals and contextual information
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 69-73).If we are to learn the effects of the environment and our day-to-day actions, and choices on our physiology, we must develop systems that will label biophysiological senor data with contextual information. In this thesis I first present an architecture and implementation of FEEL: a system for the acquisition, processing and visualization of biophysiological signals and contextual information. The system comprises a mobile client application (FMC) and a backend server, The mobile client collects contextual information: phone call details, email reading details, calendar entries, and user location at a fixed interval that is transmitted to the backend server. The backend server stores the contextual information and biophysiological signal data that is uploaded by the user, processes the information and provides a novel interface for viewing the combined data. Next, I present the results of a 10-day user study in which users wore Electrodermal Activity (EDA) wrist sensors that measured their autonomic arousal levels. These users were requested to upload the sensor data and annotate it at the end of the day at first, and then after two days. One group of users had access to both the signal and the full contextual information collected by the mobile phone and the other group could only access the bio physiological signal. At the end of the study the users were asked to fill in a System Usability Scale (SUS) questionnaire, a user experience survey and a Toronto-Alexithymia (TAS-20) questionnaire. My results show that the FEEL system enables the users to annotate bio-physiological signals at a greater effectiveness than the current state of the art. Finally, I showed that there is a correlation between a person's ability to determine their own arousal level and their score on the Toronto-alexithymia test: the less alexythymic they were, the better their correlation between the EDA and their self-reported arousal.by Yadid Ayzenberg.S.M
Dynamic Estimation of Rater Reliability using Multi-Armed Bandits
One of the critical success factors for supervised machine learning is the quality of target values, or predictions, associated with training instances. Predictions can be discrete labels (such as a binary variable specifying whether a blog post is positive or negative) or continuous ratings (for instance, how boring a video is on a 10-point scale). In some areas, predictions are readily available, while in others, the eort of human workers has to be involved. For instance, in the task of emotion recognition from speech, a large corpus of speech recordings is usually available, and humans denote which emotions are present in which recordings
Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis
Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development
La importà ncia de la veu en la comunicació empresarial. Recerca de la veu més influent
En aquesta tesi doctoral es descriu inicialment una base teòrica de l'anatomia
respiratòria on es detallen els diferents òrgans per on passa l’aire, tant de les vies
circulatòries externes (la boca, el nas, la faringe i la laringe)com de les internes (la
trà quea, els bronquis i ja finalment elspulmons). Seguidament s’explica la fisiologia
respiratòria i ventilatòria on el diafragma n’és el protagonista amb els diferents tipus de
ventilació que existeixen: la clavicular, la diafragmà tica, la costo-abdominal i la total,
entre d’altres. Se segueix descrivintl’anatomia de la veu.L’aparell fonador humà se
centra especialment en la laringe i els plecs vocals. Posteriorment es defineix la
fisiologia de la veu: la fonació. En ella es descriuen els mecanismes necessaris per tal
que es produeixi la veu i aquà definim clarament la interrelació que tenen els aparells
respiratori i fonador. Després es fa una descripció acurada de la veu i de totes les seves
qualitats i caracterÃstiques (la intensitat, el to, el timbre, la imposta, la projecció i
l’articulació). A continuació fem una anà lisi acústica i aerodinà mica de la veu i una
avaluació exhaustiva de les relacions que hi pugui haver de la mateixa amb el món
emocional.
En una segona part s'explica i es justifica el procés empÃric basat en tres instruments:
entrevistes a experts en el tema que ens apropen a copsar les evidències que
existeixen sobre el tema, una enquesta a través d’internet sobre els possibles impactes
no verbals de la veu en el món empresariali finalment un tercer instrument consistent en
una enquesta basada en deu veus considerades modelque es passaa deu persones de
cinc franges d'edat,les quals responen un test de Likert estudiant-ne posteriorment els
resultats.
Es treuen finalment conclusions i es determinen les veus amb més impacte emocional
aixà com de les aplicacions que se’n podrien derivar.Tanquem aquesta tesi on es pot
concloure que sà existeix un model de veu que pugui millorar el rendiment comunicatiu
empresarial tot evidenciatpel que s’ha trobat en els diferents instruments que s’han fet
servir en el present treball, abans esmentats. Ens decantem, segons aquestes
evidències, per un discurs amb un tempo força à gil, sensegaire pauses i que inclogui
inflexions tonals, amb una prosòdia rica de tonalitats, amb un volum força moderat i que
no té influència per part del gènere, tot i que en aquest darrer cas poden haver-hi
excepcions lligades al context.En esta tesis doctoral se describe inicialmente una base teórica de la anatomÃa
respiratoria en la cual se detallan los diferentes órganos por donde circula el aire, tanto
de las vÃas circulatorias externas (la boca, la nariz, la faringe y la laringe) como de las
internas (la tráquea, los bronquios y ya finalmente los pulmones). Seguidamente se
explica la fisiologÃa respiratoria y ventilatoria, dondeel diafragma es el protagonista con
los diferentes tipos de ventilación que existen: la clavicular, la diafragmática, la costoabdominal
y la total, entre otros. Se sigue describiendo la anatomÃa de la voz. El
aparato fonador humano se centra especialmente en la laringe y los pliegues vocales.
Posteriormente se define la fisiologÃa de la voz: la fonación. En ella se describen los
mecanismos necesarios para que se produzca la voz y aquà definimos claramente la
interrelación que tienen los aparatos respiratorio y fonador. Después se hace una
descripción detallada de la voz y de todas sus cualidades y caracterÃsticas (la
intensidad, el tono, el timbre, la impostación, la proyección y la articulación).Seguimos
con un análisis acústico y aerodinámico de la voz y una evaluación exhaustiva de las
relaciones que pueda haber de la misma con el mundo emocional.
En una segunda parte se explica y se justifica el proceso empÃrico basado en tres
instrumentos: entrevistas a expertos que nos acercan a comprender las evidencias que
existen en el tema, una encuesta a través de internet sobre los posibles impactos no
verbales de la voz en el mundo empresarial y finalmente un tercer instrumento
consistente en una encuesta basada en diez voces consideradas modelo que se pasa a
diez personas de cinco años de edad, las cuales responden una prueba de Likert
estudiándose posteriormente los resultados.
Finalmente se extraen conclusiones y se determinan las voces con mayor impacto
emocional asà como sus posibles aplicaciones. Cerramos esta tesis concluyendo que sÃ
existe un modelo de voz que puede mejorar el rendimiento comunicativo empresarial,
evidenciado a través de los diferentes instrumentos que se han utilizado en el presente
trabajo, antes mencionados. Nos decantamos, según estas evidencias, por un discurso
con un tempo bastante ágil, sin demasiadas pausas y que incluye inflexiones tonales,
con una prosodia rica en tonalidades, con un volumen bastante moderado y sin
influencia por parte del género, aunque en este último caso puede tener excepciones
ligadas al contexto.This doctoral thesis initially describes a theoretical basis of the respiratory anatomy into
the different organs through the air circulates, from the external circulatory pathways
(the mouth, the nose, the pharynx and the larynx)and the internal ones (the trachea, the
bronchus and finally the lungs). The respiratory and ventilatory physiology are explained
below, where the diaphragm is the protagonist, with the different types of ventilation that
exist: clavicular, diaphragmatic, cost-abdominal and total, among others. The anatomy
of the voice is still described, the human vocal apparatus is especially focused on the
larynx and vocal folds. Subsequently we define the physiology of the voice: the
phonation. It describes the mechanisms necessary for the voice to occur and here we
clearly define the interrelation of the respiratory and speech apparatus. Afterwards a
detailed description of the voice and of all its qualities and characteristics (the intensity,
the tone, the timbre, the imposition, the projection and the articulation). We proceed with
an acoustic and aerodynamic analysis of the voice and an exhaustive evaluation of the
relationships that may exist with the emotional world.
A second part explains and justifies the empirical process based on three instruments:
interviews with experts that bring us closer to the evidence that exists in the subject, an
online survey on the possible non-verbal impacts of voice in the world Business and
finally a third instrument consisting of a survey based on ten voices considered a model
that is passed to ten people of five years of age, who answer to a Likert test and then
study the results.
Finally conclusions are drawn and the voices with the greatest emotional impact are
determined, as well as their possible applications. We close this thesis by concluding
that there is a voice model that can improve business communication performance, all
evidenced through the different instruments that have been used in the present work,
mentioned above. We chose, according to these evidences, for a discourse with a rather
agile tempo, without too many pauses and that include tonal inflections, with a prosody
rich in tonalities, with a rather moderate volume and without influence on the part of the
genre, although in the latter case you may have context-related exceptions