3,175 research outputs found

    Audiovisual Generation of Social Attitudes from Neutral Stimuli

    Get PDF
    International audienceThe focus of this study is the generation of expressive audiovisual speech from neutral utterances for 3D virtual actors. Taking into account the segmental and suprasegmental aspects of audiovisual speech, we propose and compare several computational frameworks for the generation of expressive speech and face animation. We notably evaluate a standard frame-based conversion approach with two other methods that postulate the existence of global prosodic audiovisual patterns that are characteristic of social attitudes. The proposed approaches are tested on a database of " Exercises in Style " [1] performed by two semi-professional actors and results are evaluated using crowdsourced perceptual tests. The first test performs a qualitative validation of the animation platform while the second is a comparative study between several expressive speech generation methods. We evaluate how the expressiveness of our audiovisual performances is perceived in comparison to resynthesized original utterances and the outputs of a purely frame-based conversion system

    Nonparallel Emotional Speech Conversion

    Full text link
    We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker's identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and objective evaluations show the effectiveness of our approach.Comment: Published in INTERSPEECH 2019, 5 pages, 6 figures. Simulation available at http://www.jian-gao.org/emoga

    eXTRA: A Culturally Enriched Malay Text to Speech System

    Get PDF
    This paper concerns the incorporation of naturalness into Malay Text-to-Speech (TTS) systems through the addition of a culturally-localized affective component. Previous studies on emotion theories were examined to draw up assumptions about emotions. These studies also include the findings from observations by anthropologists and researchers on culturalspecific emotions, particularly, the Malay culture. These findings were used to elicit the requirements for modeling affect in the TTS that conforms to the people of the Malay culture in Malaysia. The goal is to introduce a novel method for generating Malay expressive speech by embedding a localized ‘emotion layer’ called eXpressive Text Reader Automation Layer, abbreviated as eXTRA. In a pilot project, the prototype is used with Fasih, the first Malay Text-to-Speech system developed by MIMOS Berhad, which can read unrestricted Malay text in four emotions: anger, sadness, happiness and fear. In this paper however, concentration is given to the first two emotions. eXTRA is evaluated through open perception tests by both native and non-native listeners. The results show more than sixty percent of recognition rate, which confirmed the satisfactory performance of the approaches

    On the Coding of Sentential Modality

    Get PDF

    QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

    Full text link
    Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver the speaker's questioning intention while transferring emotion from reference speech. We propose a multi-style extractor to extract style embedding from two different levels. While the sentence level represents emotion, the final syllable level represents intonation. For fine-grained intonation control, we use relative attributes to represent intonation intensity at the syllable level.Experiments have validated the effectiveness of QI-TTS for improving intonation expressiveness in emotional speech synthesis.Comment: Accepted by ICASSP 202

    Expressing Robot Personality through Talking Body Language

    Get PDF
    Social robots must master the nuances of human communication as a mean to convey an effective message and generate trust. It is well-known that non-verbal cues are very important in human interactions, and therefore a social robot should produce a body language coherent with its discourse. In this work, we report on a system that endows a humanoid robot with the ability to adapt its body language according to the sentiment of its speech. A combination of talking beat gestures with emotional cues such as eye lightings, body posture of voice intonation and volume permits a rich variety of behaviors. The developed approach is not purely reactive, and it easily allows to assign a kind of personality to the robot. We present several videos with the robot in two different scenarios, and showing discrete and histrionic personalities.This work has been partially supported by the Basque Government (IT900-16 and Elkartek 2018/00114), the Spanish Ministry of Economy and Competitiveness (RTI 2018-093337-B-100, MINECO/FEDER, EU)
    • …
    corecore