36 research outputs found

    Accent Group modeling for improved prosody in statistical parameteric speech synthesis

    Full text link

    Statistical parametric speech synthesis based on sinusoidal models

    Get PDF
    This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis

    Unkarin fokuksen prosodisesta toteutumisesta : Intonaatio ja kesto merkityksiä rakentamassa

    Get PDF
    Tutkielman lähtökohta Tutkielmassa pyritään selvittämään, käytetäänkö unkarin kielessä prosodisia keinoja fokuksen ilmaisemisessa. Tutkittavina ominaisuuksina ovat perustaajuuden ja vokaalien kestojen vaihtelut eri fokuskonditioissa. Kyseinen aihe valittiin, koska prosodian tutkimus on erittäin tärkeä, mutta vähän huomioitu ala käännöstieteessä. Lisäksi aiempien tutkimusten toisistaan eroavat tulokset vaativat aiheen jatkokäsittelyä. Kimmokkeena tutkielmalle toimivat samasta aiheesta, mutta suomen kielestä tehdyt tutkimukset. Näkökulmana kautta tutkielman onkin unkarin ja suomen prosodisten piirteiden vertailu. Aineisto ja menetelmät Aineistona on 10 unkarinkielisen koehenkilön ääneen lukemia SVO-rakenteisia lauseita, jotka esiintyvät neljässä eri fokuskonditiossa, jotka ovat 1) laaja fokus, 2) kapea fokus subjektilla, 3) kapea fokus verbillä sekä 4) fokus sekä subjektilla että verbillä. Lausetyyppejä on yhdeksän, joiden kunkin subjekti ja verbi ovat kaksitavuisia. Kaikkiaan tutkittavia lauseita on 1074. Äänitykseen on käytetty PsychoPy-alustalla luotua interaktiivista ohjelmaa. Äänitettyjen lauseiden subjektit ja verbit on segmentoitu Praat-ohjelmalla tavuiksi ja tavujen ydinvokaaleiksi. Tutkimuksessa datana on Praatin ProsodyPro-skriptillä erotellut eri tavujen maksimiperustaajuudet ja eri ydinvokaalien kestot. Fokuksen vaikutusta perustaajuuksiin ja vokaalikestoihin tarkastellaan tilastollisesti sekamallin avulla. Tulokset Tutkimuksessa havaitaan, että fokuskonditio vaikuttaa siihen, kuinka paljon maksimiperustaajuus muuttuu subjektin ensimmäisen ja toisen tavun välillä. Subjektin ollessa fokuksessa perustaajuus laskee erityisen jyrkästi verrattuna muihin fokuskonditioihin. Muut perustaajuuteen liittyvät löydökset eivät ole tilastollisesti merkitseviä, mutta niissäkin on havaittavissa trendejä. Fokuksen vaikutus subjektin ensimmäisen tavun sekä verbin molempien tavujen kestoon on tilastollisesti merkitsevä: kukin tavu on pidempi kyseisen sanan ollessa fokusoituna kuin laajassa fokuksessa ja toisaalta tavut ovat lyhyempiä, jos fokuksessa on lauseen toinen tutkittava sana. Tulokset ovat samansuuntaisia kuin aiemmin suomesta saadut tulokset sekä jotkin aiemmin unkarista saadut tulokset. Jatkotutkimuksissa olisi perusteltua tehdä myös havaintokokeita sekä käsitellä kontrastiivista fokusta. Tämän tutkimuksen tuloksia voi hyödyntää esimerkiksi tekstistä puheeseen ja puheesta tekstiin -kääntimissä ja ylipäätään kielen mallinnuksessa sekä kielenopetuksessa. Prosodian tiedostaminen on tärkeää myös esimerkiksi tulkille.A szakdolgozat célja E dolgozat arra igyekszik választ adni, hogy a magyar nyelv prozódiai eszközökkel fejezi-e ki a fókuszt. A vizsgált tulajdonságok a hangmagasság és a magánhangzók időtartamának változásai különböző fókuszkondíciókban. A prozódia nagyon fontos, de kevésbé kutatott terület a fordítástudományban. Ráadásul a korábbi kutatások egymástól különböző eredményei izgalmas kiindulópontot adnak további vizsgálatokhoz. Ugyanezt a témát kutatták már a finn nyelvvel kapcsolatban is, ezért ez a dolgozat összehasonlító szempontok szerint vizsgálja ezt a témát a két nyelvben. A kutatási anyag és módszer A kutatási anyag 10 magyar nyelvű informáns által felolvasott SVO szórendű mondatokból áll, amelyekben négy különböző fókuszhelyzet van: 1) tág fókusz, 2) szűk fókusz az alanyon, 3) szűk fókusz az állítmányon, és 4) fókusz mind az alanyon mind az állítmányon. A kilenc különböző mondattípus alanya és állítmánya két szótagú. A felvételeket a PsychoPy rendszerrel szerkesztett interaktív programmal végeztem. Az összesen 1074 mondat alanyát és állítmányát szótagjaik és magánhangzóik szerint szegmentáltam. A kutatási adatokhoz a szótagok maximális frekvenciája, és a magánhangzók időtartama tartozik, amelyeket a Praat programmal működő, ProsodyPro elnevezésű script segítségével gyűjtöttem ki az anyagból. A fókusz hatását e változókra statisztikailag lineáris vegyes modellel vizsgáltam meg. Eredmények A vizsgálatok alapján megállapítható, hogy a fókuszkondíció befolyásolja a maximális frekvencia változását az alany első és második szótagja között. Ha az alanyra esik a fókusz, a többi kondícióhoz képest a hangmagasság különösen meredeken ereszkedik. A többi frekvenciával kapcsolatos eredmény statisztikailag nem jelentős, de bizonyos trendeket követ. A fókusz hatása az alany első szótagjának és az állítmány mindkét szótagjának időtartamára viszont jelentős statisztikailag: egyrészt a tág fókuszhoz képest a szótagok hosszabbak, ha a szó fókuszban van, másrészt a szótagok rövidebbek, ha a fókusz a másik vizsgált szóra esik. Az eredmények hasonló irányúak, mint a finn nyelvvel kapcsolatos korábbi eredmények, és hasonlítanak bizonyos magyar vizsgálatok eredményeire is. A további kutatások során a percepciós vizsgálat és a kontrasztív fókusz figyelembevétele lennének indokoltak. A jelen kutatás eredményeit például beszédfelismerő programok, beszédszintetizátorok, nyelvmodellezés fejlesztésében, valamint a nyelvtanításban is fel lehet használni. A prozódia tudatosítása nem kevésbé fontos a tolmácsok számára sem

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

    Danish activities concerning noise in the environment (A)

    Get PDF

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    "On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory.

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory
    corecore