10 research outputs found

    HMM-Based Emotional Speech Synthesis Using Average Emotion Model

    Full text link
    Abstract. This paper presents a technique for synthesizing emotional speech based on an emotion-independent model which is called “average emotion” model. The average emotion model is trained using a multi-emotion speech da-tabase. Applying a MLLR-based model adaptation method, we can transform the average emotion model to present the target emotion which is not included in the training data. A multi-emotion speech database including four emotions, “neutral”, “happiness”, “sadness”, and “anger”, is used in our experiment. The results of subjective tests show that the average emotion model can effectively synthesize neutral speech and can be adapted to the target emotion model using very limited training data

    Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

    Get PDF
    Prosody plays an important role in both identification and synthesis of emotionalized speech. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windows of speech (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an emotional speech from the same user, it might be better to alter the pitch parameters at the suprasegmental level like at the syllable-level since the changes in the signal are more subtle and smooth. In this paper we aim to show that the pitch transformation in a neutral-to-emotional voice conversion system may result in a better speech quality output if the transformations are performed at the supra-segmental (syllable) level rather than a frame-level change. Subjective evaluation results are shown to demonstrate if the naturalness, speaker similarity and the emotion recognition tasks show any performance difference

    Expression of basic emotions in Estonian parametric text-to-speech synthesis

    Get PDF
    The goal of this study was to conduct modelling experiments, the purpose of which was the expression of three basic emotions (joy, sadness and anger) in Estonian parametric text-to-speech synthesis on the basis of both a male and a female voice. For each emotion, three different test models were constructed and presented for evaluation to subjects in perception tests. The test models were based on the basic emotions’ characteristic parameter values that had been determined on the basis of human speech. In synthetic speech, the test subjects most accurately recognized the emotion of sadness, and least accurately the emotion of joy. The results of the test showed that, in the case of the synthesized male voice, the model with enhanced parameter values performed best for all three emotions, whereas in the case of the synthetic female voice, different emotions called for different models: the model with decreased values was the most suitable one for the expression of joy, and the model with enhanced values was the most suitable for the expression of sadness and anger. Logistic regression was applied to the results of the perception tests in order to determine the significance and contribution of each acoustic parameter in the emotion models, and the possible need to adjust the values of the parameters.KokkuvĂ”te. Kairi Tamuri ja Meelis Mihkla: PĂ”hiemotsioonide vĂ€ljendusvĂ”imalused eestikeelsel parameetrilisel kĂ”nesĂŒnteesil. Uurimistöö eesmĂ€rk oli lĂ€bi viia modelleerimiseksperimente kolme pĂ”hiemotsiooni (rÔÔmu, kurbuse ja viha) vĂ€ljendamiseks eestikeelsel parameetrilisel kĂ”nesĂŒnteesil nii mees- kui ka naissĂŒnteeshÀÀle baasil. Selleks koostati iga emotsiooni kohta kolm erinevat katsemudelit, mida lasti katseisikutel tajutestidel hinnata. Katsemudelite aluseks oli inimkĂ”ne pĂ”hjal mÀÀratud pĂ”hiemotsioonidele omased parameetrite vÀÀrtused. Emotsioonidest tunti sĂŒnteeskĂ”nes kĂ”ige paremini Ă€ra kurbuse-emotsioon ning kĂ”ige halvemini rÔÔmu-emotsioon. Testitulemused nĂ€itasid, et kui meessĂŒnteeshÀÀle puhul töötas kĂ”igi kolme emotsiooni puhul kĂ”ige paremini vĂ”imendatud vÀÀrtuste mudel, siis naissĂŒnteeshÀÀle puhul vajasid erinevad emotsioonid erinevaid mudeleid: rÔÔmu vĂ€ljendamiseks sobis kĂ”ige paremini vĂ€hendatud vÀÀrtuste mudel, kurbuse ja viha vĂ€ljendamiseks vĂ”imendatud vÀÀrtuste mudel. Tajutestide tulemusi analĂŒĂŒsiti logistilisel regressioonil, et teha kindlaks ĂŒksikute akustiliste parameetrite olulisus ja osakaal emotsiooni mudelites ning parameetrite vÀÀrtuste korrigeerimisvajadused.MĂ€rksĂ”nad: eesti keel, emotsioonid, kĂ”nesĂŒntees, akustiline mudel, kĂ”netempo, intensiivsus, pĂ”hitoo

    Emotion transplantation through adaptation in HMM-based speech synthesis

    Get PDF
    This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students’ satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications

    Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

    Get PDF
    AbstractWe present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation

    PAPER Special Section on Corpus-Based Speech Technologies Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

    No full text
    SUMMARY This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech — neutral, rough, joyful, andsad — and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80 % of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the styledependent modeling method. key words: HMM-based speech synthesis, expressive speech synthesis, speaking style, emotional expression, acoustic modeling, decision tree 1
    corecore