12 research outputs found

    HMM-Based Emotional Speech Synthesis Using Average Emotion Model

    Full text link
    Abstract. This paper presents a technique for synthesizing emotional speech based on an emotion-independent model which is called “average emotion” model. The average emotion model is trained using a multi-emotion speech da-tabase. Applying a MLLR-based model adaptation method, we can transform the average emotion model to present the target emotion which is not included in the training data. A multi-emotion speech database including four emotions, “neutral”, “happiness”, “sadness”, and “anger”, is used in our experiment. The results of subjective tests show that the average emotion model can effectively synthesize neutral speech and can be adapted to the target emotion model using very limited training data

    Improvements of Hungarian Hidden Markov Model-based text-to-speech synthesis

    Get PDF
    Statistical parametric, especially Hidden Markov Model-based, text-to-speech (TTS) synthesis has received much attention recently. The quality of HMM-based speech synthesis approaches that of the state-of-the-art unit selection systems and possesses numerous favorable features, e.g. small runtime footprint, speaker interpolation, speaker adaptation. This paper presents the improvements of a Hungarian HMM-based speech synthesis system, including speaker dependent and adaptive training, speech synthesis with pulse-noise and mixed excitation. Listening tests and their evaluation are also described

    Rejtett Markov-modell alapĂș szövegfelolvasĂł adaptĂĄciĂłja fĂ©lig spontĂĄn magyar beszĂ©ddel

    Get PDF
    Napjainkban szĂĄmos automatikus szövegfelolvasĂĄsi mĂłdszer lĂ©tezik, de az elmĂșlt Ă©vekben a legnagyobb figyelmet a statisztikai parametrikus beszĂ©dkeltĂ©si mĂłdszer, ezen belĂŒl is a rejtett Markov-modell (Hidden Markov Model, HMM) alapĂș szövegfelolvasĂĄs kapta. A HMM-alapĂș szövegfelolvasĂĄs minsĂ©ge megközelĂ­ti a manapsĂĄg legjobbnak szĂĄmĂ­tĂł elemkivĂĄlasztĂĄsos szintĂ©zisĂ©t, Ă©s ezen tĂșl szĂĄmos elnnyel rendelkezik: adatbĂĄzisa kevĂ©s helyet foglal el, lehetsĂ©ges Ășj hangokat kĂŒlön felvĂ©telek nĂ©lkĂŒl lĂ©trehozni, Ă©rzelmeket kifejezni vele, Ă©s mĂĄr nĂ©hĂĄny mondatnyi felvĂ©tel esetĂ©n is lehetsĂ©ges az adott beszĂ©l hangkarakterĂ©t visszaadni. Jelen cikkben bemutatjuk a HMM-alapĂș beszĂ©dkeltĂ©s alapjait, a beszĂ©ladaptĂĄciĂłjĂĄnak lehetsĂ©geit, a magyar nyelvre elkĂ©szĂŒlt beszĂ©lfĂŒggetlen HMM adatbĂĄzist Ă©s a beszĂ©ladaptĂĄciĂł folyamatĂĄt fĂ©lig spontĂĄn magyar beszĂ©d esetĂ©n. Az eredmĂ©nyek kiĂ©rtĂ©kelĂ©se cĂ©ljĂĄbĂłl meghallgatĂĄsos tesztet vĂ©gzĂŒnk nĂ©gy kĂŒlönböz hang adaptĂĄciĂłja esetĂ©n, melyeket szintĂ©n ismertetĂŒnk a cikkĂŒnkben

    Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

    Get PDF
    In voice conversion, frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our objective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normalization and the baseline GMM conversion

    Prosody in Swiss French Accents: Investigation using Analysis by Synthesis

    Get PDF
    It is very common for a language to have different dialects or accents. The different pronunciations of the same words is one of the reasons for the different accents, in the same language. Swiss French accents have similar pronunciation to standard French, but noticeable differences in prosody. In this paper we investigate the use of standard French synthetic acoustic parameters combined with Swiss French prosody in order to evaluate the importance of prosody in modelling Swiss French accents. We use speech synthesis techniques to produce standard French pronunciation with Swiss French duration and intonation. Subjective evaluation to rate the degree of Swiss accent was conducted and showed that prosody modification alone reduces perceived difference between original Swiss accented speech and standard French coupled with original duration and intonation by 29%

    Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

    Full text link

    Acta Cybernetica : Volume 19. Number 4.

    Get PDF

    Akustische Phonetik und ihre multidisziplinÀren Aspekte

    Get PDF
    The aim of this book is to honor the multidisciplinary work of Doz. Dr. Sylvia MoosmĂŒller† in the field of acoustic phonetics. The essays in this volume range from sociophonetics, language diagnostics, dialectology, to language technology. They thus exemplify the breadth of acoustic phonetics, which has been shaped by influences from the humanities and technical sciences since its beginnings.Ziel dieses Buches ist es, die multidisziplinĂ€re Arbeit von Doz. Dr. Sylvia MoosmĂŒller (†) im Bereich der akustischen Phonetik zu wĂŒrdigen. Die AufsĂ€tze in diesem Band sind in der Soziophonetik, Sprachdiagnostik, Dialektologie und Sprachtechnologie angesiedelt. Sie stellen damit exemplarisch die Breite der akustischen Phonetik dar, die seit ihrer Entstehung durch EinflĂŒsse aus den Geisteswissenschaften und den technischen Wissenschaften geprĂ€gt war
    corecore