3,506 research outputs found

    Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

    Full text link
    Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising

    New Perspectives in Teaching Pronunciation

    Get PDF
    pp.165-18

    Emotion Recognition from Acted and Spontaneous Speech

    Get PDF
    Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

    What we learn about language from Spoken Corpus Linguistics?

    Get PDF
    Over the last few decades, the Spoken Corpus Linguistics (SCL) has achieved a great deal in terms of quantity and quality of works (O’Keeffe, McCarthy 2010). Enormous progress has been made in the last thirty years and the increment of multimodal corpora stimulates sophisticated investigations on the relationship between the verbal and non-verbal component of spoken communication (Knight 2011). The SCL is a very vital field of research, which is able to provide essential data and tools for the advancement of language knowledge. In this article I will focus on the contribution that SCL and the resulting data provide to general linguistics. In § 2, I discuss the contribution that the SCL gives to a better understanding of linguistic variation; in § 3, I show how the SCL can improve the descriptive adequacy of grammar; finally, § 4 is dedicated to the contribution that speech data can give to a better knowledge of the grammaticality of languages. Across the article I will use mainly data from Italian corpora, but widely validated by comparison with data from corpora of other languages

    Using the ToBI transcription to record the intonation of Slovene

    Get PDF
    The paper presents ToBI, a transcription method for prosodic annotation. ToBI is an acronym for Tones and Breaks Indices which first denoted an intonation system developed in the 1990s for annotating intonation and prosody in the database of spoken Mainstream American English. The MAE_ToBI transcription originally consists of six parts - the audio recording of the utterance, the fundamental frequency contour and four parallel tiers for the transcription of tone sequence, ortographic transcription, indication of break indices between words and for additional observations. The core of the transcription, i. e. of the phonological analyses of the intonation pattern, is represented by the tone tier where tonal variation is transcribed by using labels for high tone and low tone where a tone can appear as a pitch accent, phrase accentand boundary tone. Due to its simplicity and flexibility, the system soon began to be used for the prosodic annotation of other variants of English and many other languages, as well as in different non-linguistic fields, leading to the creation of many new ToBI systems adapted to individual languages and dialects. The author is the first to use this method for Slovene, more precisely, for the intonational transcription and analysis of the corpus of spontaneous speech of Slovene Istria, in order to investigate if the ToBi system is useful for the annotation of Slovene and its regional variants.Članek predstavlja ToBI, transkripcijsko metodo za zapis prozodičnih dogodkov. ToBI je kratica za Tones and Breaks Indices, ki izvirno poimenuje intonacijski sistem, ki je bil razvit v 90-ih letih prejšnjega stoletja in zgrajen za označevanje intonacije in prozodije v podatkovni bazi govorjene ameriške angleščine (Mainstream American English). MAE_ToBI transkripcija po prvotnem dogovoru sestoji iz šestih delov - iz zvočnega posnetka izreka, zapisa poteka osnovne frekvence in štirih vzporedno poravnanih pasov, ki so namenjeni transkripciji tonskega poteka, ortografskemu zapisu izreka, označevanju jakosti mej med besedami ter zapisovanju dodatnih opazovanj. Jedro zapisa oziroma fonoloških analiz intonacijskega vzorca predstavlja tonski pas, v katerem z oznakami za visoki in nizki ton transkribiramo razlikovalna tonska nihanja. Sistem se je zaradi svoje enostavnosti in prilagodljivosti hitro razširil na prozodično označevanje ostalih variant angleščine in mnogih drugih jezikov ter na različna nelingvistična področja, nastali so številnih novih ToBI-sistemi, prilagojeni posameznim jezikom ali narečjem. Metoda je bila prvič uporabljena za zapis in analizo intonacije na korpusu spontanega govora govorcev v Slovenski Istri z namenom preizkusiti, v kolikšni meri je ToBI primeren za opis intonacije slovenskega jezika in njegovih pokrajinskih različic

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
    corecore