492 research outputs found

    Speech emotion recognition through statistical classification

    Get PDF
    O propósito desta dissertação é a discussão do reconhecimento de emoção na voz. Para este fim, criou-se uma base de dados validada de discurso emocional simulado Português, intitulada European Portuguese Emotional Discourse Database (EPEDD) e foram operados algoritmos de classificação estatística nessa base de dados. EPEDD é uma base de dados simulada, caracterizada por pequenos discursos (5 frases longas, 5 frases curtas e duas palavras), todos eles pronunciados por 8 atores—ambos os sexos igualmente representados—em 9 diferentes emoções (raiva, alegria, nojo, excitação, apatia, medo, surpresa, tristeza e neutro), baseadas no modelo de emoções de Lövheim. Concretizou-se uma avaliação de 40% da base de dados por avaliadores inexperientes, filtrando 60% dos pequenos discursos, com o intuito de criar uma base de dados validada. A base de dados completa contem 718 instâncias, enquanto que a base de dados validada contém 116 instâncias. A qualidade média de representação teatral, numa escala de a 5 foi avaliada como 2,3. A base de dados validada é composta por discurso emocional cujas emoções são reconhecidas com uma taxa média de 69,6%, por avaliadores inexperientes. A raiva tem a taxa de reconhecimento mais elevada com 79,7%, enquanto que o nojo, a emoção cuja taxa de reconhecimento é a mais baixa, consta com 40,5%. A extração de características e a classificação estatística foi realizada respetivamente através dos softwares Opensmile e Weka. Os algoritmos foram operados na base dados original e na base de dados avaliada, tendo sido obtidos os melhores resultados através de SVMs, respetivamente com 48,7% e 44,0%. A apatia obteve a taxa de reconhecimento mais elevada com 79,0%, enquanto que a excitação obteve a taxa de reconhecimento mais baixa com 32,9%.The purpose of this dissertation is to discuss speech emotion recognition. It was created a validated acted Portuguese emotional speech database, named European Portuguese Emotional Discourse Database (EPEDD), and statistical classification algorithms have been applied on it. EPEDD is an acted database, featuring 12 utterances (2 single-words, 5 short sentences and 5 long sentences) per actor and per emotion, 8 actors, both genders equally represented, and 9 emotions (anger, joy, disgust, excitement, fear, apathy, surprise, sadness and neutral), based on Lövheim’s emotion model. We had 40% of the database evaluated by unexperienced evaluators, enabling us to produce a validated one, filtering 60% of the evaluated utterances. The full database contains 718 instances, while the validated one contains 116 instances. The average acting quality of the original database was evaluated, in a scale from 1 to 5, as 2,3. The validated database is composed by emotional utterances that have their emotions recognized on average at a 69,6% rate, by unexperienced judges. Anger had the highest recognition rate at 79,7%, while disgust had the lowest recognition rate at 40,5%. Feature extraction and statistical classification algorithms were performed respectively applying Opensmile and Weka software. Statistical classification algorithms operated in the full database and in the validated one, best results being obtained by SVMs, respectively the emotion recognition rates being 48,7% and 44,0%. Apathy had the highest recognition rate: 79.0%, while excitement had the lowest emotion recognition rate: 32.9%

    Eesti emotsionaalse kõne korpuse loomine ja emotsioonide taju

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Väitekirja eesmärk oli luua Eesti emotsionaalse kõne korpuse teoreetiline alus ja kontrollida loodud korpuse materjali põhjal teoreetiliste seisukohtade õigsust. Uurimus näitas, kui oluline on korpust enne selle loomist hoolikalt planeerida ja tulemust analüüsida. Saadud teadmisi saavad rakendada nii emotsiooniuurijad kui ka kõnekorpuste arendajad. Eesti korpuse teeb teiste kõneemotsioonikorpuste seas ainulaadseks asjaolu, et lausete emotsioon on märgendatud selle järgi, kas emotsiooni kannab lause heli või mõjutab emotsiooni äratundmist häälest lause verbaalne sisu. Selline jaotus teeb võimalikuks emotsioonide uurimise nii kõnes kui kirjas. Eesti emotsionaalse kõne korpus on üks väheseid esilekutsutud mõõdukalt väljendunud emotsioone sisaldavaid kõnekorpusi, mis on dokumenteeritud, avalikult ja tasuta kättesaadav. Korpuse jaoks on salvestatud n-ö tavalise inimese etteloetud tekstid, kellele ei ole öeldud, millise emotsiooniga tuleb tekste lugeda. Kuna Eesti emotsionaalse kõne korpuses olevate lausete emotsioonid on määranud kuulajad testidega, on töös olulised emotsioonide tajuga seotud küsimused. Väitekirja raames on leidnud kinnitust, et kuulajad suudavad hästi ära tunda mõõdukalt väljendatud emotsioone mitteprofessionaalse lugeja häälest. Uurimistulemused toetavad otsust valida Eesti emotsionaalse kõne korpuse lausete emotsiooni määrajateks üle 30-aastased eesti keelt emakeelena rääkivad täiskasvanud eestlased, kuna nad suudavad noortest paremini dekodeerida sõnumi emotsiooni. Samuti näitasid tulemused, et emotsioonidest arusaamine on kultuurisõltlik Uurimistulemused ei kinnitanud empaatia olulist rolli emotsioonide tuvastamisel häälest, küll aga näitasid meeste ja naiste erinevust emotsioonide tuvastamisel. Korpus on niisugusena, nagu ta teoreetiliselt kavandati olemas ja sisaldab praegu ühe naishääle lauseid, mis on klassifitseeritud vihaks, rõõmuks, kurbuseks ja neutraalsuseks (vt http://peeter.eki.ee:5000). Kuna Eesti emotsionaalse kõne korpus on kergesti laiendatav, arendatakse seda edasi vastavalt uutele uurimissuundadele.The aim of the thesis was to develop a theoretical base for the Estonian Emotional Speech Corpus and to test the validity of the theoretical starting-points on the Corpus material. The Corpus is now ready as designed (see http://peeter.eki.ee:5000). The results of the research reveal the importance of detailed planning and of the design elements of the Corpus. The theoretical starting-points of the study are relevant and applicable in real situations. Therefore these results could be taken into consideration in the creation of other emotional speech corpora. What makes this Corpus unique among the other corpora of its kind is the fact that its sentences have different labels according to whether their emotion is carried just by the sound of the sentence or whether the recognition of their emotion from vocal expression may be influenced by the verbal-semantic content. This classification enables the research of emotions both in speech as well as in writing. Estonian Emotional Speech Corpus is one of the few freely available documented ones that reviews moderately expressed emotions. The Corpus abandoned acted emotions because of their possible stereotypicality and overactedness. The sentences recorded for the Corpus were read out by a so-called ordinary person, who was not dictated what emotion to use while reading. The Corpus contains 1,234 Estonian sentences that have passed both reading and listening tests. Test takers identified 908 sentences that expressed anger, joy, sadness, or were neutral. As the emotions of the sentences contained in the Corpus were determined by listeners, some issues of emotion perception came to the fore: 1) Is sentence emotion identifiable purely from vocal cues, without the speaker being seen? 2) Can age affect the identification of emotion? 3) Is the identification of emotion culturally bound? 4) Does identification depend on the listeners’ empathy? For the first question asking if the emotion of a sentence can be identified from non-acted vocal expression without the speaker being seen, results confirmed the supposition that listeners can recognize the moderate expression of non-acted emotions from the voice of a non-professional reeder. Also, the results support the decision that the emotions of the sentences in the Estonian Emotional Speech Corpus should be determined by Estonian adults aged over 30 who speak Estonian as their native language because they are more likely to have acquired the skills for decoding the culture-specific expression of emotions. Furthermore, the results imply that the understanding of emotions depends on cultural factors and social interactions, including the social norms specific to one culture. The interpretation of emotional messages is therefore learned in the course of social interactions. Research has shown, that in the recognition of emotion from vocal cues, empathy is less important than clinical results would suggest. In conducting emotion studies for speech technological purposes, it is obviously unnecessary to exclude non-empathic people from the testers for the reason that they may not recognize the emotions expressed if their low empathy level is not due to mental or developmental disorders. The Corpus continues to be developed according to the requirements of new research directions. As the Corpus is publicly available and accessible for free, its data can be used for tackling different research challenges

    An ongoing review of speech emotion recognition

    Get PDF
    User emotional status recognition is becoming a key feature in advanced Human Computer Interfaces (HCI). A key source of emotional information is the spoken expression, which may be part of the interaction between the human and the machine. Speech emotion recognition (SER) is a very active area of research that involves the application of current machine learning and neural networks tools. This ongoing review covers recent and classical approaches to SER reported in the literature.This work has been carried out with the support of project PID2020-116346GB-I00 funded by the Spanish MICIN

    Proceedings of the Sempre MET2018: Researching Music, Education, Technology

    Get PDF
    MET 2018 Researching Music - Education - Technology (MET2018) 26–27 March 2018 Following the great success of its inaugural conference held by the University of Hull in 2010, MET2014, and MET2016 at IOE London, this fourth two-day conference (#sempreMET) was hosted by the Department of culture, Communication & Media, IOE, University College London, at the University of London’s iconic Senate House. Although the 'musicking' humanity has been reliant on technology from the very beginning of its musical journey, we cannot deny that, nowadays, technology changes, develops, and its role is being redefined at a dramatically greater rate. This sempre conference aimed to celebrate technology's challenging role(s) and provide a platform for critical discourse and the presentation of scholarly work in the broader fields of digital technologies in: music composition and creation music performance music production (recording, studio work, archival and/or communication of music) diverse musical genres (e.g. popular, classical, world, etc.) creativity/ies real world praxial contexts (e.g. classroom, studio, etc.) assessment of musical development and/or assessment of performance computational musicology music and Big Data (a special call for chapters for an edited OUP VOLUME will be posted soon) the music industry special educational contexts/needs The conference provided opportunities for colleagues to present and discuss ideas in a friendly and supportive environment, as well as to provide a meeting point for academics, scholars, teachers, and practitioners who were seeking to form connections and synergies with participants from around the world

    New directions in corpus-based translation studies

    Get PDF
    Corpus-based translation studies has become a major paradigm and research methodology and has investigated a wide variety of topics in the last two decades. The contributions to this volume add to the range of corpus-based studies by providing examples of some less explored applications of corpus analysis methods to translation research. They show that the area keeps evolving as it constantly opens up to different frameworks and approaches, from appraisal theory to process-oriented analysis, and encompasses multiple translation settings, including (indirect) literary translation, machine(-assisted) translation and the practical work of professional legal translators. The studies included in the volume also expand the range of application of corpus applications in terms of the tools used to accomplish the research tasks outlined

    New directions in corpus-based translation studies

    Get PDF
    Corpus-based translation studies has become a major paradigm and research methodology and has investigated a wide variety of topics in the last two decades. The contributions to this volume add to the range of corpus-based studies by providing examples of some less explored applications of corpus analysis methods to translation research. They show that the area keeps evolving as it constantly opens up to different frameworks and approaches, from appraisal theory to process-oriented analysis, and encompasses multiple translation settings, including (indirect) literary translation, machine(-assisted) translation and the practical work of professional legal translators. The studies included in the volume also expand the range of application of corpus applications in terms of the tools used to accomplish the research tasks outlined

    New directions in corpus-based translation studies

    Get PDF
    Corpus-based translation studies has become a major paradigm and research methodology and has investigated a wide variety of topics in the last two decades. The contributions to this volume add to the range of corpus-based studies by providing examples of some less explored applications of corpus analysis methods to translation research. They show that the area keeps evolving as it constantly opens up to different frameworks and approaches, from appraisal theory to process-oriented analysis, and encompasses multiple translation settings, including (indirect) literary translation, machine(-assisted) translation and the practical work of professional legal translators. The studies included in the volume also expand the range of application of corpus applications in terms of the tools used to accomplish the research tasks outlined

    IberSPEECH 2020: XI Jornadas en Tecnología del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli
    corecore