861 research outputs found

    Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based Features Extracted in Different Acoustic Conditions

    Get PDF
    ABSTRACT: In the last years, there has a great progress in automatic speech recognition. The challenge now it is not only recognize the semantic content in the speech but also the called "paralinguistic" aspects of the speech, including the emotions, and the personality of the speaker. This research work aims in the development of a methodology for the automatic emotion recognition from speech signals in non-controlled noise conditions. For that purpose, different sets of acoustic, non-linear, and wavelet based features are used to characterize emotions in different databases created for such purpose

    Automatic Emotion Recognition: Quantifying Dynamics and Structure in Human Behavior.

    Full text link
    Emotion is a central part of human interaction, one that has a huge influence on its overall tone and outcome. Today's human-centered interactive technology can greatly benefit from automatic emotion recognition, as the extracted affective information can be used to measure, transmit, and respond to user needs. However, developing such systems is challenging due to the complexity of emotional expressions and their dynamics in terms of the inherent multimodality between audio and visual expressions, as well as the mixed factors of modulation that arise when a person speaks. To overcome these challenges, this thesis presents data-driven approaches that can quantify the underlying dynamics in audio-visual affective behavior. The first set of studies lay the foundation and central motivation of this thesis. We discover that it is crucial to model complex non-linear interactions between audio and visual emotion expressions, and that dynamic emotion patterns can be used in emotion recognition. Next, the understanding of the complex characteristics of emotion from the first set of studies leads us to examine multiple sources of modulation in audio-visual affective behavior. Specifically, we focus on how speech modulates facial displays of emotion. We develop a framework that uses speech signals which alter the temporal dynamics of individual facial regions to temporally segment and classify facial displays of emotion. Finally, we present methods to discover regions of emotionally salient events in a given audio-visual data. We demonstrate that different modalities, such as the upper face, lower face, and speech, express emotion with different timings and time scales, varying for each emotion type. We further extend this idea into another aspect of human behavior: human action events in videos. We show how transition patterns between events can be used for automatically segmenting and classifying action events. Our experimental results on audio-visual datasets show that the proposed systems not only improve performance, but also provide descriptions of how affective behaviors change over time. We conclude this dissertation with the future directions that will innovate three main research topics: machine adaptation for personalized technology, human-human interaction assistant systems, and human-centered multimedia content analysis.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133459/1/yelinkim_1.pd

    Characterization of the autonomic nervous system response under emotional stimuli through linear and non-linear analysis of physiological signals

    Get PDF
    En esta disertación se presentan metodologías lineales y no lineales aplicadas a señales fisiológicas, con el propósito de caracterizar la respuesta del sistema nervioso autónomo bajo estímulos emocionales. Este estudio está motivado por la necesidad de desarrollar una herramienta que identifique emociones en función de su efecto sobre la actividad cardíaca, ya que puede tener un impacto potencial en la práctica clínica para diagnosticar enfermedades psico-neuronales.Las hipótesis de esta tesis doctoral son que las emociones inducen cambios notables en el sistema nervioso autónomo y que estos cambios pueden capturarse a partir del análisis de señales fisiológicas, en particular, del análisis conjunto de la variabilidad del ritmo cardíaco (HRV) y la respiración.La base de datos analizada contiene el registro simultáneo del electrocardiograma y la respiración de 25 sujetos elicitados con emociones inducidas por vídeos, incluyendo las siguientes emociones: alegría, miedo, tristeza e ira.En esta disertación se describen dos estudios metodológicos.En el primer estudio se propone un método basado en el análisis lineal de la HRV guiado por la respiración. El método se basó en la redefinición de la banda de alta frecuencia (HF), no solo centrándose en la frecuencia respiratoria, sino también considerando un ancho de banda que dependiera del espectro respiratorio. Primero, el método se validó con señales de HRV simuladas, obteniéndose errores mínimos de estimación en comparación con la definición de la banda de HF clásica e incluso con la banda de HF centrada en la frecuencia respiratoria pero con un ancho de banda constante, independientemente de los valores del ratio simpático-vagal.Después, el método propuesto se aplicó en una base de datos de elicitación emocional inducida mediante vídeos para discriminar entre emociones. No solo la banda de HF redefinida propuesta superó a las otras definiciones de banda de HF en discriminación emocional, sino también la correlación máxima entre los espectros de la HRV y de la respiración discriminó alegría y relajación, alegría y cada emoción de valencia negativa y entre miedo y tristeza con un p-valor ≤ 0.05 y AUC ≥ 0.70.En el segundo estudio, técnicas no lineales como la Función de Auto Información Mutua y la Función de Información Mutua Cruzada, AMIF y CMIF respectivamente, son también propuestas en esta tesis doctoral para el reconocimiento de emociones humanas. La técnica AMIF se aplicó a las señales de HRV para estudiar interdependencias complejas, y se consideró la técnica CMIF para cuantificar el acoplamiento complejo entre las señales de HRV y de respiración. Ambos algoritmos se adaptaron a las series temporales RR de corta duración. Las series RR fueron filtradas en las bandas de baja y alta frecuencia, y también se investigaron las series RR filtradas en un ancho de banda basado en la respiración.Los resultados revelaron que la técnica AMIF aplicada a la serie temporal RR filtrada en la banda de HF redefinida fue capaz de discriminar entre: relajación y alegría y miedo, alegría y cada valencia negativa y finalmente miedo y tristeza e ira, todos con un nivel de significación estadística (p-valor ≤ 0.05, AUC ≥ 0.70). Además, los parámetros derivados de AMIF y CMIF permitieron caracterizar la baja complejidad que la señal presentaba durante el miedo frente a cualquier otro estado emocional estudiado.Finalmente se investiga, mediante un clasificador lineal, las características lineales y no lineales que discriminan entre pares de emociones y entre valencias emocionales para determinar qué parámetros permiten diferenciar los grupos y cuántos de éstos son necesarios para lograr la mejor clasificación posible. Los resultados extraídos de este capítulo sugieren que pueden ser clasificadas mediante el análisis de la HRV: relajación y alegría, la valencia positiva frente a todas las negativas, alegría y miedo, alegría y tristeza, alegría e ira, y miedo y tristeza.El análisis conjunto de la HRV y la respiración aumenta la capacidad discriminatoria de la HRV, siendo la máxima correlación entre los espectros de la HRV y la respiración uno de los mejores índices para la discriminación de emociones. El análisis de la información mutua, aun en señales de corta duración, añade información relevante a los índices lineales para la discriminación de emociones.<br /

    Voice source characterization for prosodic and spectral manipulation

    Get PDF
    The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase. In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters reported in the literature, complemented with our own results from the vowel database. The results show that our method gives satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good). Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in order to achieve quality levels similar to the reference methods. As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters extracted using our algorithm have a positive impact in the field of automatic emotion classification

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Connecting people through physiosocial technology

    Get PDF
    Social connectedness is one of the most important predictors of health and well-being. The goal of this dissertation is to investigate technologies that can support social connectedness. Such technologies can build upon the notion that disclosing emotional information has a strong positive influence on social connectedness. As physiological signals are strongly related to emotions, they might provide a solid base for emotion communication technologies. Moreover, physiological signals are largely lacking in unmediated communication, have been used successfully by machines to recognize emotions, and can be measured relatively unobtrusively with wearable sensors. Therefore, this doctoral dissertation examines the following research question: How can we use physiological signals in affective technology to improve social connectedness? First, a series of experiments was conducted to investigate if computer interpretations of physiological signals can be used to automatically communicate emotions and improve social connectedness (Chapters 2 and 3). The results of these experiments showed that computers can be more accurate at recognizing emotions than humans are. Physiological signals turned out to be the most effective information source for machine emotion recognition. One advantage of machine based emotion recognition for communication technology may be the increase in the rate at which emotions can be communicated. As expected, experiments showed that increases in the number of communicated emotions increased feelings of closeness between interacting people. Nonetheless, these effects on feelings of closeness are limited if users attribute the cause of the increases in communicated emotions to the technology and not to their interaction partner. Therefore, I discuss several possibilities to incorporate emotion recognition technologies in applications in such a way that users attribute the communication to their interaction partner. Instead of using machines to interpret physiological signals, the signals can also be represented to a user directly. This way, the interpretation of the signal is left to be done by the user. To explore this, I conducted several studies that employed heartbeat representations as a direct physiological communication signal. These studies showed that people can interpret such signals in terms of emotions (Chapter 4) and that perceiving someone's heartbeat increases feelings of closeness between the perceiver and sender of the signal (Chapter 5). Finally, we used a field study (Chapter 6) to investigate the potential of heartbeat communication mechanisms in practice. This again confirmed that heartbeat can provide an intimate connection to another person, showing the potential for communicating physiological signals directly to improve connectedness. The last part of the dissertation builds upon the notion that empathy has positive influences on social connectedness. Therefore, I developed a framework for empathic computing that employed automated empathy measurement based on physiological signals (Chapter 7). This framework was applied in a system that can train empathy (Chapter 8). The results showed that providing users frequent feedback about their physiological synchronization with others can help them to improve empathy as measured through self-report and physiological synchronization. In turn, this improves understanding of the other and helps people to signal validation and caring, which are types of communication that improve social connectedness. Taking the results presented in this dissertation together, I argue that physiological signals form a promising modality to apply in communication technology (Chapter 9). This dissertation provides a basis for future communication applications that aim to improve social connectedness

    Intelligent Biosignal Analysis Methods

    Get PDF
    This book describes recent efforts in improving intelligent systems for automatic biosignal analysis. It focuses on machine learning and deep learning methods used for classification of different organism states and disorders based on biomedical signals such as EEG, ECG, HRV, and others

    Image Sentiment Analysis of Social Media Data

    Get PDF
    Often a picture is worth a thousand words, and this is a small statement that represents one of the biggest challenges in the Image Sentiment Analysis area. The main theme of this dissertation is the Image Sentiment Analysis of social media, mainly from Twitter, so that it is identified as situations that represent risks (identification of negative situations) or that become a risk (prediction of negative situations). Despite the diversity of work done in the area of image sentiment analysis, it is still a challenging task. Several factors contribute to the difficulty, both more global factors likewise sociocultural issues, and issues within the scope of the analysis of feeling in images, such as the difficulty in finding reliable and properly labeled data to be used, as well as factors faced during the classification, for example, it is normal to associate images with darker colors and low brightness to negative feelings, after all, most are like that, but some cases escape this rule, and it is these cases that affect the accuracy of the developed models. However, in order to overcome these problems faced in classification, a multitasking model was developed, which will consider the entire image information, information from the salient areas in the images, and the facial expressions of faces contained in the images, and textual information, so that each component complements the other during classification. During the experiments it was possible to observe that the use of the proposed models can bring advantages for the classification of feeling in images and even work around some problems evidenced in existing works, such as the irony of the text. Therefore, this work aims to present the state of the art and the study carried out, in order to enable the presentation and implementation of the proposed model and carrying out the experiments and discussion of the results obtained, in order to verify the effectiveness of what was proposed. Finally, conclusions about the work done and future work will be presented.Muitas vezes uma imagem vale mais que mil palavras, e esta é uma pequena afirmação que representa um dos maiores desafios da área de classificação do sentimento contido nas imagens. O principal tema desta dissertação é a realização da análise do sentimento contido em imagens das mídias sociais, principalmente do Twitter, de modo que possam ser identificadas as situações que representam riscos (identificação de situações negativas) ou as quais possam se tornar um (previsão de situações negativas). Apesar da diversidade de trabalhos feitos na área da análise de sentimento em imagens, ainda é uma tarefa desafiante. Diversos fatores contribuem para a dificuldade , tantos fatores mais globais como questões socioculturais, quanto questões do próprio âmbito de análise de sentimento em imagens, como a dificuldade em achar dados confiáveis e devidamente etiquetados para serem utilizados, quanto fatores enfrentados durante a classificação, como por exemplo, é normal associar imagens com cores mais escuras e pouco brilho à sentimentos negativos, afinal a maioria é assim, entretanto há casos que fogem dessa regra, e são esses casos que afetam a precisão dos modelos desenvolvidos. Porém, visando contornar esses problemas enfrentados na classificação, foi desenvolvido um modelo multitarefas, o qual irá considerar informações globais, áreas salientes nas imagens, expressões faciais de rostos contidos nas imagens e informação textual, de modo que cada componente se complemente durante a classificação. Durante os experimentos foi possível observar que o uso dos modelos propostos podem trazer vantagens para a classificação do sentimento em imagens e até mesmo contornar alguns problemas evidenciados nos trabalhos já existentes, como por exemplo a ironia do texto. Assim sendo, este trabalho tem como objetivo apresentar o estado da arte e o estudo realizado, de modo a possibilitar a apresentação e implementação do modelo multitarefas proposto e realização das experiências e discussão dos resultados obtidos, de forma a verificar a eficácia do método proposto. Por fim, as conclusões sobre o trabalho feito e trabalho futuro serão apresentados

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore