    Prosodic Event Recognition using Convolutional Neural Networks with Context Information

    This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur

    Exploring complex vowels as phrase break correlates in a corpus of English speech with ProPOSEL, a prosody and POS English lexicon

    Real-world knowledge of syntax is seen as integral to the machine learning task of phrase break prediction but there is a deficiency of a priori knowledge of prosody in both rule-based and data-driven classifiers. Speech recognition has established that pauses affect vowel duration in preceding words. Based on the observation that complex vowels occur at rhythmic junctures in poetry, we run significance tests on a sample of transcribed, contemporary British English speech and find a statistically significant correlation between complex vowels and phrase breaks. The experiment depends on automatic text annotation via ProPOSEL, a prosody and part-of-speech English lexicon. Copyright © 2009 ISCA

    Gépi beszéd természetességének növelése automatikus, beszédjel alapú hangsúlycímkéző algoritmussal

    A minél természetesebb hangzás elérése a géppel előállított beszédben napjainkban is igen fontos kutatási terület. A hangzás természetességét számos más tényező mellett a prozódia is nagyban befolyásolja, ezért alapvető követelmény egy olyan, precízen annotált korpusz megléte, amely alapján gépi tanulással pontos generatív modelleket állíthatunk elő. A korpusz kézi címkézése költséges és hosszadalmas, még a prozódiai egységekre, hangsúlyokra vonatkozóan is, ráadásul nemzetközi tapasztalatok is igazolják, hogy a szakértő címkézők ítélete is szubjektív, hiszen a különböző szakértők által előállított hangsúlyozásra vonatkozó annotációk közötti átfedés ritkán haladja meg a 80%-ot. A fentiek miatt gyakran használnak automatikus címkéző eljárásokat. A hangsúlycímkézést leggyakrabban a szöveges átirat alapján végzik el, ami azonban szerényebb pontosságot szolgáltat az emberi annotáláshoz képest. Alternatívaként jelen munkában egy beszédjel alapú hangsúlycímkéző algoritmust valósítunk meg. Az így nyert hangsúlycímkézés ellenőrzésére hat (3-3 férfi és női) HMM-TTS rendszert tanítunk, majd szubjektív lehallgatási tesztekkel (CMOS) hasonlítjuk össze a rendszereket

    Complex vowels as boundary correlates in a multi-speaker corpus of spontaneous English speech

    We have found empirical evidence of a correlation in English between words containing complex vowels (diphthongs and triphthongs) and ‘gold-standard’ phrase break annotations in datasets as apparently different as seventeenth-century verse and a Reith lecture transcript on economics from the late twentieth-century. Spontaneous speech in the form of BBC radio news reportage from the 1980s again exhibits this statistically significant correlation for five out of ten speakers, leading to speculation as to why speakers should fall into two distinct groups. The experiment depends on the automatic annotation of text with a priori knowledge from ProPOSEL, a prosody and part-of-speech English lexicon

    Sound Pattern Matching for Automatic Prosodic Event Detection

    Prosody in speech is manifested by variations of loudness, exaggeration of pitch, and specific phonetic variations of prosodic segments. For example, in the stressed and unstressed syllables, there are differences in place or manner of articulation, vowels in unstressed syllables may have a more central articulation, and vowel reduction may occur when a vowel changes from a stressed to an unstressed position. In this paper, we characterize the sound patterns using phonological posteriors to capture the phonetic variations in a concise manner. The phonological posteriors quantify the posterior probabilities of the phonological features given the input speech acoustics, and they are obtained using the deep neural network (DNN) computational method. Built on the assumption that there are unique sound patterns in different prosodic segments, we devise a sound pattern matching (SPM) method based on 1-nearest neighbour classifier. In this work, we focus on automatic detection of prosodic stress placed on words, called also emphasized words. We evaluate the SPM method on English and French data with emphasized words. The word emphasis detection works very well also on cross-lingual tests, that is using a French classifier on English data, and vice versa

    Acoustic identification of sentence accent in speakers with dysarthria : cross-population validation and severity related patterns

    Dysprosody is a hallmark of dysarthria, which can affect the intelligibility and naturalness of speech. This includes sentence accent, which helps to draw listeners’ attention to important information in the message. Although some studies have investigated this feature, we currently lack properly validated automated procedures that can distinguish between subtle performance differences observed across speakers with dysarthria. This study aims for cross-population validation of a set of acoustic features that have previously been shown to correlate with sentence accent. In addition, the impact of dysarthria severity levels on sentence accent production is investigated. Two groups of adults were analysed (Dutch and English speakers). Fifty-eight participants with dysarthria and 30 healthy control participants (HCP) produced sentences with varying accent positions. All speech samples were evaluated perceptually and analysed acoustically with an algorithm that extracts ten meaningful prosodic features and allows a classification between accented and unaccented syllables based on a linear combination of these parameters. The data were statistically analysed using discriminant analysis. Within the Dutch and English dysarthric population, the algorithm correctly identified 82.8 and 91.9% of the accented target syllables, respectively, indicating that the capacity to discriminate between accented and unaccented syllables in a sentence is consistent with perceptual impressions. Moreover, different strategies for accent production across dysarthria severity levels could be demonstrated, which is an important step toward a better understanding of the nature of the deficit and the automatic classification of dysarthria severity using prosodic features

    Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

    Euskararen bariazioa eta bariazioaren irakaskuntza - II

    144 p.Aurkibidea: - Sarrera (Iglesias, A.; Romero, A.; Ensunza, A.). - Transcribing intonation with ETI_TOBI (Elvira-García, W.). - Diatech tresna informatikoaren bertsio berria (Aurrekoetxea, G.; Iglesias, A.; Santander, G.; Usobiaga, I.). - Euskararen bariazio geo-morfologiaren azterketa (Videgain, X.; Aurrekoetxea, G.). - Subject pronoun expression in Basque: description and pedagogical implications (Sainz-Maza, L.; Rodríguez, I.). - Bariazio prosodikoaren eragina testu irakurrien ulermenean (Etxebarria, A.; Romero, A.; Gaminde, I.; Garay, U.). - Dos décadas de dialectometría entonativa (Roseano, P.). - Euskaldun berrien eta zaharren ulermena jarreren inguruan (Asensio, N.; Barrios, H.; Lázaro, A.; Sáez, I.). - Hizkuntza-aldaketaren inguruko jarrera eta prestigioaz (Ensunza, A.). - Keinuen eta bokalizazioen arteko harremanak komunikazio-funtzio goiztiarretan: ELAN softwarea erabiliz (Romero, A.; Etxebarria, A.; De Pablo, I.; Sanz, A.). - Hizkuntza aldakortasuna Larrabetzuko aditz morfologian (Etxebarria, A.; Gaminde, I.; Olalde, A.; Gaminde, U.). - Bizkaiko aditz laguntzaileen bilakaeraren azterketaz (Gaminde, I.; Romero, A.; Etxebarria, A.; Eguskiza, N.)