19 research outputs found

    A VOWEL-STRESS EMOTIONAL SPEECH ANALYSIS METHOD

    Get PDF
    The analysis of speech, particularly for emotional content, is an open area of current research. This paper documents the development of a vowel-stress analysis framework for emotional speech, which is intended to provide suitable assessment of the assets obtained in terms of their prosodic attributes. The consideration of different levels of vowel-stress provides means by which the salient points of a signal may be analysed in terms of their overall priority to the listener. The prosodic attributes of these events can thus be assessed in terms of their overall significance, in an effort to provide a means of categorising the acoustic correlates of emotional speech. The use of vowel-stress is performed in conjunction with the definition of pitch and intensity contours, alongside other micro-prosodic information relating to voice quality

    LinguaTag: an Emotional Speech Analysis Application

    Get PDF
    The analysis of speech, particularly for emotional content, is an open area of current research. Ongoing work has developed an emotional speech corpus for analysis, and defined a vowel stress method by which this analysis may be performed. This paper documents the development of LinguaTag, an open source speech analysis software application which implements this vowel stress emotional speech analysis method developed as part of research into the acoustic and linguistic correlates of emotional speech. The analysis output is contained within a file format combining SMIL and SSML markup tags, to facilitate search and retrieval methods within an emotional speech corpus database. In this manner, analysis performed using LinguaTag aims to combine acoustic, emotional and linguistic descriptors in a single metadata framework

    Task-Based Mood Induction Procedures for the Elicitation of Natural Emotional Responses.

    Get PDF
    This paper details experimental procedures designed to elicit real emotional responses from participants within a controlled acoustic environment. The experiments use Mood Induction Procedures (MIP’s), specifically MIP 4, to implement a co-operative task using two participants. These cooperative tasks are designed to engender emotional responses of activation and evaluation from the participants who are situated in separate isolation booths, thus reducing unwanted noise in the signal, preventing the participants from being distracted and ensuring a cleanly recorded audio signal. The audio is recorded at a professional level of quality (24bit/192Khz). The emotional dimensions of each audio recording will be evaluated using listening tests in conjunction with the FeelTrace tool, providing a statistical evaluation of these recordings that will be used to compile an emotional speech corpus. This corpus can then be analysed to define a set of rules for the detection of basic emotional dimensions in speech

    InproTKs: A Toolkit for Incremental Situated Processing

    Get PDF
    Kennington C, Kousidis S, Schlangen D. InproTKs: A Toolkit for Incremental Situated Processing. In: Proceedings of SIGdial 2014: Short Papers. 2014: 84-88

    Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction

    Get PDF
    Detecting emotional dimensions [1] in speech is an area of great research interest, notably as a means of improving human computer interaction in areas such as speech synthesis [2]. In this paper, a method of obtaining high quality emotional audio speech assets is proposed. The methods of obtaining emotional content are subject to considerable debate, with distinctions between acted [3] and natural [4] speech being made based on the grounds of authenticity. Mood Induction Procedures (MIP’s) [5] are often employed to stimulate emotional dimensions in a controlled environment. This paper details experimental procedures based around MIP 4, using performance related tasks to engender activation and evaluation responses from the participant. Tasks are specified involving two participants, who must co-operate in order to complete a given task [6] within the allotted time. Experiments designed in this manner also allow for the specification of high quality audio assets (notably 24bit/192Khz [7]), within an acoustically controlled environment [8], thus providing means of reducing unwanted acoustic factors within the recorded speech signal. Once suitable assets are obtained, they will be assessed for the purposes of segregation into differing emotional dimensions. The most statistically robust method of evaluation involves the use of listening tests to determine the perceived emotional dimensions within an audio clip. In this experiment, the FeelTrace [9] rating tool is employed within user listening tests to specify the categories of emotional dimensions for each audio clip

    The Use of Task Based Mood-Induction Procedures to Generate High Quality Emotional Assets

    Get PDF
    Detecting emotion in speech is important in advancing human-computer interaction, especially in the area of speech synthesis. This poster details experimental procedures based on Mood Induction Procedure 4, using performance related tasks to engender natural emotional responses in participants. These tasks are aided or hindered by the researcher to illicit the desired emotional response. These responses will then be recorded and their emotional content graded to form the basis of an emotional speech corpus. This corpus will then be used to develop a rule-set for basic emotional dimensions in speech

    DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter

    Get PDF
    Hough J, Tian Y, de Ruiter L, et al. DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter. In: 10th edition of the Language Resources and Evaluation Conference. 2016

    Monitoring Convergence of Temporal Features in Spontaneous Dialogue Speech

    Get PDF
    This paper presents ongoing research on convergence of speech features in human dialogues, in view of simulating this behaviour in spoken dialogue systems. The TAMA method (time-aligned moving average), previously used on monitoring convergence of acoustic prosodic (a/p) features, is applied to temporal properties of speech (between-turn pauses and overlaps). The results are compared to those of an older study on the same features
    corecore