6 research outputs found

    Auditory Affective Norms for German: Testing the Influence of Depression and Anxiety on Valence and Arousal Ratings

    Get PDF
    BACKGROUND: The study of emotional speech perception and emotional prosody necessitates stimuli with reliable affective norms. However, ratings may be affected by the participants' current emotional state as increased anxiety and depression have been shown to yield altered neural responding to emotional stimuli. Therefore, the present study had two aims, first to provide a database of emotional speech stimuli and second to probe the influence of depression and anxiety on the affective ratings. METHODOLOGY/PRINCIPAL FINDINGS: We selected 120 words from the Leipzig Affective Norms for German database (LANG), which includes visual ratings of positive, negative, and neutral word stimuli. These words were spoken by a male and a female native speaker of German with the respective emotional prosody, creating a total set of 240 auditory emotional stimuli. The recordings were rated again by an independent sample of subjects for valence and arousal, yielding groups of highly arousing negative or positive stimuli and neutral stimuli low in arousal. These ratings were correlated with participants' emotional state measured with the Depression Anxiety Stress Scales (DASS). Higher depression scores were related to more negative valence of negative and positive, but not neutral words. Anxiety scores correlated with increased arousal and more negative valence of negative words. CONCLUSIONS/SIGNIFICANCE: These results underscore the importance of representatively distributed depression and anxiety scores in participants of affective rating studies. The LANG-audition database, which provides well-controlled, short-duration auditory word stimuli for the experimental investigation of emotional speech is available in Supporting Information S1

    Speech emotion recognition through statistical classification

    Get PDF
    O propósito desta dissertação é a discussão do reconhecimento de emoção na voz. Para este fim, criou-se uma base de dados validada de discurso emocional simulado Português, intitulada European Portuguese Emotional Discourse Database (EPEDD) e foram operados algoritmos de classificação estatística nessa base de dados. EPEDD é uma base de dados simulada, caracterizada por pequenos discursos (5 frases longas, 5 frases curtas e duas palavras), todos eles pronunciados por 8 atores—ambos os sexos igualmente representados—em 9 diferentes emoções (raiva, alegria, nojo, excitação, apatia, medo, surpresa, tristeza e neutro), baseadas no modelo de emoções de Lövheim. Concretizou-se uma avaliação de 40% da base de dados por avaliadores inexperientes, filtrando 60% dos pequenos discursos, com o intuito de criar uma base de dados validada. A base de dados completa contem 718 instâncias, enquanto que a base de dados validada contém 116 instâncias. A qualidade média de representação teatral, numa escala de a 5 foi avaliada como 2,3. A base de dados validada é composta por discurso emocional cujas emoções são reconhecidas com uma taxa média de 69,6%, por avaliadores inexperientes. A raiva tem a taxa de reconhecimento mais elevada com 79,7%, enquanto que o nojo, a emoção cuja taxa de reconhecimento é a mais baixa, consta com 40,5%. A extração de características e a classificação estatística foi realizada respetivamente através dos softwares Opensmile e Weka. Os algoritmos foram operados na base dados original e na base de dados avaliada, tendo sido obtidos os melhores resultados através de SVMs, respetivamente com 48,7% e 44,0%. A apatia obteve a taxa de reconhecimento mais elevada com 79,0%, enquanto que a excitação obteve a taxa de reconhecimento mais baixa com 32,9%.The purpose of this dissertation is to discuss speech emotion recognition. It was created a validated acted Portuguese emotional speech database, named European Portuguese Emotional Discourse Database (EPEDD), and statistical classification algorithms have been applied on it. EPEDD is an acted database, featuring 12 utterances (2 single-words, 5 short sentences and 5 long sentences) per actor and per emotion, 8 actors, both genders equally represented, and 9 emotions (anger, joy, disgust, excitement, fear, apathy, surprise, sadness and neutral), based on Lövheim’s emotion model. We had 40% of the database evaluated by unexperienced evaluators, enabling us to produce a validated one, filtering 60% of the evaluated utterances. The full database contains 718 instances, while the validated one contains 116 instances. The average acting quality of the original database was evaluated, in a scale from 1 to 5, as 2,3. The validated database is composed by emotional utterances that have their emotions recognized on average at a 69,6% rate, by unexperienced judges. Anger had the highest recognition rate at 79,7%, while disgust had the lowest recognition rate at 40,5%. Feature extraction and statistical classification algorithms were performed respectively applying Opensmile and Weka software. Statistical classification algorithms operated in the full database and in the validated one, best results being obtained by SVMs, respectively the emotion recognition rates being 48,7% and 44,0%. Apathy had the highest recognition rate: 79.0%, while excitement had the lowest emotion recognition rate: 32.9%

    Constituição de base de dados de estímulos auditivos emocionais e definição de normas afetivas para português europeu

    Get PDF
    Diversos estudos têm sido realizados com vista à compreensão dos mecanismos subjacentes ao processamento emocional, existindo, atualmente, bases de dados de estímulos emocionais em diferentes formatos. É, no entanto, mais escassa a investigação sobre a prosódia emocional, sendo que, para a população portuguesa, até à data, não existia uma base de dados de estímulos emocionais em formato auditivo. O presente estudo, inserido numa linha de investigação dedicada à intervenção na ruminação com recurso a estímulos neste formato – Rumination Room – visou colmatar esta lacuna, tendo por objetivos: 1) constituir uma base de estímulos emocionais auditivos em português europeu; 2) estabelecer normas afetivas (valência e ativação) para estes estímulos; e 3) averiguar se o grau de ruminação “cismar” (maladaptativo) tem impacto na avaliação afetiva destes estímulos. A partir da adaptação para português europeu das Affective Norms for English Words (ANEW-EP), foram selecionadas 40 palavras negativas (valência baixa e ativação elevada), 40 palavras positivas (valência e ativação elevadas) e 40 palavras neutras (valência e ativação neutras). Este conjunto de 120 palavras foi gravado por uma voz masculina e uma voz feminina de falantes nativos da língua portuguesa com a prosódia emocional correspondente. Uma amostra de 126 participantes avaliou a valência e a ativação deste conjunto de gravações com a escala SAM, resultando na constituição da base de estímulos ANEW (EP) Auditiva. Os resultados da avaliação afetiva dos estímulos auditivos indicaram os seguintes padrões: valência – positivas > neutras > negativas; e ativação – positivas > negativas > neutras. As gravações positivas revelaram ainda valores mais elevados no que se refere à sua duração e ao parâmetro acústico pitch médio. As avaliações afetivas foram relacionadas com o grau de “cismar” dos participantes, medido pela versão reduzida da Ruminative Responses Scale (RRS), tendo-se observado uma tendência não significativa nos participantes com valores mais elevados de “cismar” para atribuir valores de valência mais baixos às palavras negativas, mais elevados às positivas e mais baixos às neutras, bem como para atribuir valores mais elevados de ativação às palavras positivas e negativas. A ANEW (EP) Auditiva consiste num importante recurso para a investigação que utilize discurso emocionalSeveral studies have been conducted to understand the mechanisms underlying emotional processing, and there are currently several databases of emotional stimuli in different formats. There is, however, limited research aimed at studying emotional prosody, and there is no database of emotional auditory stimuli for European Portuguese. The present study, which is part of a research project dedicated to rumination intervention by using auditory stimuli – Rumination Room – aimed to fill this gap, according to the following goals: 1) to constitute an European Portuguese database of affective stimuli in auditory format; 2) to establish affective norms (valence and arousal) for these stimuli, and 3) to investigate whether the degree of rumination "brooding" (maladaptive) has an impact on the affective evaluation of auditory stimuli. Based on the adaptation of the Affective Norms for English Words for European Portuguese (ANEW-EP), 40 negative words (low valence and high activation), 40 positive words (high valence and activation) and 40 neutral words (neutral valence and activation) were selected. This set of 120 words was recorded by a male voice and a female voice of native speakers of the Portuguese language with corresponding emotional prosody. A sample of 126 participants rated the valence and arousal of this set of recordings with the SAM scale, resulting in the constitution of the Auditory ANEW (EP) stimuli database. The results of the affective evaluation of auditory stimuli indicated the following patterns: valence – positive > neutral > negative and activation – positive > negative > neutral. Positive recordings also revealed higher values in terms of their duration and the mean pitch acoustic parameter. These affective ratings were related to the participants’ degree of “brooding”, as measured by the reduced version of the Ruminative Responses Scale (RRS), revealing a trend in participants with higher "brooding" scores to assign lower valence values to negative stimuli, higher to positive and lower to neutral, as well as to assign higher arousal values to positive and negative words. The Auditory ANEW (EP) database constitutes an important resource for experimental research using emotional discourseMestrado em Psicologia da Saúde e Reabilitação Neuropsicológic

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework

    Optimization of automatic speech emotion recognition systems

    Get PDF
    Osnov za uspešnu integraciju emocionalne inteligencije u sofisticirane sisteme veštačke inteligencije jeste pouzdano prepoznavanje emocionalnih stanja, pri čemu se paralingvistički sadržaj govora izdvaja kao posebno značajan nosilac informacija o emocionalnom stanju govornika. U ovom radu je sprovedena komparativna analiza obeležja govornog signala i klasifikatorskih metoda najčešće korišćenih u rešavanju zadatka automatskog prepoznavanja emocionalnih stanja govornika, a zatim su razmotrene mogućnosti popravke performansi sistema za automatsko prepoznavanje govornih emocija. Izvršeno je unapređenje diskretnih skrivenih Markovljevih modela upotrebom QQ krive za potrebe određivanja etalona vektorske kvantizacije, a razmotrena su i dodatna unapređenja modela. Ispitane su mogućnosti vernije reprezentacije govornog signala, pri čemu je analiza proširena na veliki broj obeležja iz različitih grupa. Formiranje velikih skupova obeležja nameće potrebu za redukcijom dimenzija, gde je pored poznatih metoda analizirana i alternativna metoda zasnovana na Fibonačijevom nizu brojeva. Na kraju su razmotrene mogućnosti integracije prednosti različitih pristupa u jedinstven sistem za automatsko prepoznavanje govornih emocija, tako da je predložena paralelna multiklasifikatorska struktura sa kombinatornim pravilom koje pored rezultata klasifikacije pojedinačnih klasifikatora ansambla koristi i informacije o karakteristikama klasifikatora. Takođe, dat je predlog automatskog formiranja ansambla klasifikatora proizvoljne veličine upotrebom redukcije dimenzija zasnovane na Fibonačijevom nizu brojevaThe basis for the successful integration of emotional intelligence into sophisticated systems of artificial intelligence is the reliable recognition of emotional states, with the paralinguistic content of speech standing out as a particularly significant carrier of information regarding the emotional state of the speaker. In this paper, a comparative analysis of speech signal features and classification methods most often used for solving the task of automatic recognition of speakers' emotional states is performed, after which the possibilities for improving the performances of the systems for automatic recognition of speech emotions are considered. Discrete hidden Markov models were improved using the QQ plot for the purpose of determining the codevectors for vector quantization, and additional models improvements were also considered. The possibilities for a more faithful representation of the speech signal were examined, whereby the analysis was extended to a large number of features from different groups. The formation of big sets of features imposes the need for dimensionality reduction, where an alternative method based on the Fibonacci sequence of numbers was analyzed, alongside known methods. Finally, the possibilities for integrating the advantages of different approaches into a single system for automatic recognition of speech emotions are considered, so that a parallel multiclassifier structure is proposed with a combinatorial rule, which, in addition to the classification results of individual ensemble classifiers, uses information about classifiers' characteristics. A proposal is also given for the automatic formation of an ensemble of classifiers of arbitrary size by using dimensionality reduction based on the Fibonacci sequence of numbers
    corecore