    Sociophonetic perspectives on stylistic diversity in speech research

    Broadcasting Your Variety: Namibian English(es) on YouTube

    ZĂ€hres F. Broadcasting Your Variety: Namibian English(es) on YouTube. Presented at the IAWE 2019 - 24th Conference of the International Association of World Englishes, Limerick, Ireland.Submitted Abstract: "English, despite its limited history within the country, gained significant ground in Namibia over the last 30 years, which has become evident through recent quantitative and qualitative research on English in Namibia (cf. e.g. Buschfeld & Kautzsch 2014; Kautzsch & Schröder 2016, Stell 2016). This research also suggests that English is moving from foreign to second language status with nativization being observable on several linguistic levels. Additionally, in Namibia’s urban center, the former Afrikaans-dominated diglossic situation is progressively changing towards a “triglossic pattern dominated by English” (Stell 2016: 326). What could complement this traditional picture, and is missing thus far, is a digital perspective: Young Namibians use online social media services in all their facets, including content creation on the video-sharing platform YouTube. Since the audience of these channels transcends national and ethnic boundaries, this data could shed light on general questions of Namibian sociolinguistic standards and identities as well as details of the role and features of English in Namibia. The majority of Namibian YouTubers use English for their broadcasts from the periphery and almost exclusively produce natural videos, according to the typology proposed by Schneider (2016), which renders this digital content a valuable yet largely unexplored resource in the World Englishes context. The present paper aims to address this methodological gap by using a corpus consisting of five hours of YouTube data to scrutinize recent hypotheses on the status and features of Namibian English, focusing on phonological aspects such as splits in the lexical sets NURSE and KIT as well as mergers of the DRESS, TRAP, and NURSE vowels (cf. Kautzsch et al. 2017).

    Multilingual markers of depression in remotely collected speech samples: A preliminary analysis

    Background: Speech contains neuromuscular, physiological and cognitive components, and so is a potential biomarker of mental disorders. Previous studies indicate that speaking rate and pausing are associated with major depressive disorder (MDD). However, results are inconclusive as many studies are small and underpowered and do not include clinical samples. These studies have also been unilingual and use speech collected in controlled settings. If speech markers are to help understand the onset and progress of MDD, we need to uncover markers that are robust to language and establish the strength of associations in real-world data. // Methods: We collected speech data in 585 participants with a history of MDD in the United Kingdom, Spain, and Netherlands as part of the RADAR-MDD study. Participants recorded their speech via smartphones every two weeks for 18 months. Linear mixed models were used to estimate the strength of specific markers of depression from a set of 28 speech features. // Results: Increased depressive symptoms were associated with speech rate, articulation rate and intensity of speech elicited from a scripted task. These features had consistently stronger effect sizes than pauses. // Limitations: Our findings are derived at the cohort level so may have limited impact on identifying intra-individual speech changes associated with changes in symptom severity. The analysis of features averaged over the entire recording may have underestimated the importance of some features. // Conclusions: Participants with more severe depressive symptoms spoke more slowly and quietly. Our findings are from a real-world, multilingual, clinical dataset so represent a step-change in the usefulness of speech as a digital phenotype of MDD

    Acoustics and discourse function of two types of breathing signals

    Cwiek A, Wlodarczak M, Heldner M, Wagner P. Acoustics and discourse function of two types of breathing signals. In: Abrahamsen JE, Koreman J, van Dommelen WA, eds. Nordic Prosody: Proceedings of the XIIth Conference, Trondheim 2016. Frankfurt a.M.: Peter Lang Publishing Group; 2017: 83-91.Breathing is fundamental for living and speech, and it has been a subject of linguistic research for years. Recently, there has been a renewed interest in tackling the question of possible communicative functions of breathing (e.g. Rochet-Capellan & Fuchs, 2014; Aare, WƂodarczak & Heldner, 2014; WƂodarczak & Heldner, 2015; WƂodarczak, Heldner, & Edlund, 2015). The present study set out to determine acoustic markedness and communicative functions of pauses accompanied and non-accompanied by breathing. We hypothesised that an articulatory reset occurring in breathing pauses and an articulatory freeze in non-breathing pauses differentiates between the two types. A production experiment was conducted and some evidence in favour of such a phenomenon was found. Namely, in case of non-breathing pauses, we observed more coarticulation evidenced by a more frequent omission of plosive releases. Our findings thus give some evidence in favour of the communicative function of breathing

    Asymmetrinen Lombard-efekti – YhtĂ€aikainen keskustelu meluisassa ja hiljaisessa ympĂ€ristössĂ€

    Ihmiset muuttavat ÀÀnentuotantoaan kuuluvammaksi meluisassa ympĂ€ristössĂ€ refleksinomaisesti. TĂ€tĂ€ ilmiötĂ€ kutsutaan Lombard-efektiksi. Efekti saa puhujan tuottamaan Lombard-puhetta, jota on tutkittu jo yli vuosisadan ajan eri nĂ€kökulmista. Lombard-puheen akustiikalle ominaista ovat korotettu ÀÀnenpainetaso, korotettu puheÀÀnen perustaajuus, muutokset ÀÀnen osataajuuksissa sekĂ€ muissa ÀÀnen spektrin rakenteissa. LisĂ€ksi Lombard-puheessa vokaalien pituuksilla on tapana kasvaa, ja ÀÀrimmĂ€isissĂ€ meluolosuhteissa hyperartikulaatiota voi esiintyĂ€. Puhetilanteeseen sisĂ€ltyvĂ€ kommunikatiivinen aspekti on keskeistĂ€ ilmiön synnylle. TĂ€mĂ€n tutkielman tavoitteena oli tutkia puheentuottoa keskustelutilanteessa, jossa samanaikaisesti toinen keskustelijoista on altistettuna melulle ja tuottaa tĂ€ten Lombard-puhetta, ja toinen keskustelija kommunikoi hiljaisuudessa ilman taustamelun suoria vaikutuksia, ja selvittÀÀ, onko puheen akustiikassa tai ymmĂ€rrettĂ€vyydessĂ€ eroavaisuuksia tĂ€llaisessa epĂ€symmetrisessĂ€ tilanteessa verrattuna symmetriseen puhetilanteeseen, jossa molempien puhujien ÀÀniympĂ€ristö on sama. Tutkimusta varten kaksi paria suomenkielisiĂ€ keskustelijoita (yhteensĂ€ neljĂ€ osallistujaa, kaikki naisia) ratkoivat pareittain sudokupohjaisia tehtĂ€viĂ€ kolmessa eri taustamelutilanteessa: (1) hiljaisuudessa, (2) molempien ollessa taustamelussa (symmetrinen), ja (3) vain toisen keskustelijan ollessa taustamelussa (asymmetrinen). Taustamelu, jota soitettiin koehenkilöille 75 dB ÀÀnenpainetasolla, oli laadultaan cocktail-melua, joka sisĂ€ltÀÀ niin kutsuttua puheensorinaa jossa useampi puhuja puhuu pÀÀllekkĂ€in. Keskustelut ÀÀnitettiin ja niistĂ€ kerĂ€ttiin yhteensĂ€ 453 maalitavua, joista kaikista analysoitiin keskimÀÀrĂ€inen ÀÀnenpainetaso, ja 417 maalitavusta analysoitiin keskimÀÀrĂ€inen perustaajuus. Ă„Ă€nenpainetason ja perustaajuuden arvot normalisoitiin ja arvoille suoritettiin keskiarvoja ja variansseja vertailevat tilastolliset testit. Odotetusti kaikki puhujat korottivat ÀÀnenpainetasoaan ja perustaajuuttaan siirryttĂ€essĂ€ hiljaisesta keskustelutilanteesta symmetriseen taustamelutilanteeseen, jossa molemmat keskustelukumppanit tuottivat Lombard-puhetta. Henkilöt, jotka asymmetrisessĂ€ keskustelutilanteessa olivat itse hiljaisuudessa ja kommunikoivat keskustelukumppanille, joka oli melussa, korottivat sekĂ€ ÀÀnenpainetasoaan ettĂ€ perustaajuuttaan asymmetrisessĂ€ keskustelutilanteessa verrattuna hiljaiseen keskustelutilanteeseen. LisĂ€ksi toinen nĂ€istĂ€ puhujista korotti sekĂ€ ÀÀnenpainetasoaan ettĂ€ perustaajuuttaan lĂ€hes oman Lombard-puheensa tasolle, jota mitattiin symmetrisessĂ€ tilanteessa. Puhujat, jotka olivat altistettuna melulle asymmetrisessĂ€ tilanteessa, kĂ€yttivĂ€t keskimÀÀrin matalampaa ÀÀnenpainetasoa asymmetrisessĂ€ kuin symmetrisessĂ€ tilanteessa, vaikka tuottivatkin Lombard-puhetta molemmissa tilanteissa. VÀÀrin kuultuja maalitavuja ei havaittu asymmetrisessĂ€ tilanteessa, vaan henkilöt, jotka olivat kyseisessĂ€ tilanteessa hiljaisuudessa, onnistuivat korottamaan ÀÀntÀÀn tarvittavalle tasolle, jotta ratkaiseva tieto saatiin kommunikoitua melussa olevalle henkilölle. TĂ€mĂ€ tutkimus osoitti, ettĂ€ kahden keskustelukumppanin ÀÀniympĂ€ristöjen ollessa eriĂ€vĂ€t, kumpikaan keskustelijoista ei tuota tĂ€ysin sentyyppistĂ€ puhetta, joka olisi sopivaa heidĂ€n senhetkiseen ÀÀniympĂ€ristöönsĂ€, vaan puheentuottoon vaikuttaa myös vĂ€lillisesti keskustelukumppanin ÀÀniympĂ€ristö. LisĂ€ksi tutkimus osoitti, ettĂ€ siinĂ€ missĂ€ puhetilanteen kommunikatiivisuus voi lisĂ€tĂ€ Lombard-efektin vaikutuksia, se voi myös hĂ€ivyttÀÀ niitĂ€. Jatkotutkimuksissa tulisi kerĂ€tĂ€ enemmĂ€n dataa ja suorittaa datalle laajempaa analyysiĂ€.Humans increase their vocal efforts in a noisy environment in a reflex-like manner. This phenomenon is called the Lombard effect. The effect causes the speaker to produce Lombard speech, which has been researched for over a century from different standpoints. Lombard speech is characterized by increased mean energy intensity level, increased fundamental frequency, changes in the formant frequencies, and in other spectral qualities of the voice. In addition, vowel durations tend to increase and in extreme noise conditions, a speaker might hyperarticulate. The communicative aspect of a speech situation is essential to the emergence of the phenomenon. The goal of this thesis was to examine speech production in a conversational situation where simultaneously one of the interlocutors engaged in a conversation is subjected to noise and is thus producing Lombard speech, while the other interlocutor is communicating in silence without the direct effects of background noise, and to determine, whether there are differences in the acoustics or the intelligibility of speech in such an asymmetrical speech situation compared to a symmetrical situation where the noise environment of the interlocutors is the same. Two pairs of Finnish speakers (4 participants altogether, all female) were recorded doing sudoku-based tasks in three different background noise conditions: (1) in quiet, (2) with both interlocutors in noise (symmetrical), and (3) with only one of the interlocutors subjected to noise (asymmetrical). The background noise, played at 75 dB, was cocktail noise, which includes unintelligible speech from simultaneous speakers. Altogether 453 target syllables were collected, and the mean energy intensity level was extracted from each syllable. Mean fundamental frequency (f0) data was extracted from 417 target syllables. The values of f0 and intensity were normalized and statistical tests comparing means and variances were carried out on the data. Expectedly all participants increased their intensity level and f0 from the quiet to the symmetrical condition, where both interlocutors produced Lombard speech. The participants who during the asymmetrical condition were in silence and communicated to the interlocutor who was in noise increased both their intensity and f0 in the asymmetrical condition compared to the quiet condition. In addition, one of these participants increased both measures to nearly the levels that were measured from her Lombard speech in the symmetrical condition. The participants who were subjected to noise during the asymmetrical condition on average used lower intensity levels in the asymmetrical condition than in the symmetrical condition, even though they produced Lombard speech during both. No target syllables were misheard during the asymmetrical condition, rather, the participants who were in silence during said condition managed to increase their vocal efforts to a level that ensured the communication of crucial information to the person in noise. This experiment demonstrated that when the sound environments of two interlocutors are different, neither of the interlocutors produces speech that would be completely suitable for their respective environments but are indirectly affected by the sound environments of their conversational partners. In addition, it was shown that while communicativeness can increase the effects of the Lombard effect, it can also decrease them. For further research into the topic more data should be gathered, and wider analyses should be carried out

    Latentin prosodia-avaruuden analysointi ja puhetyylien hallinta suomenkielisessÀ end-to-end puhesynteesissÀ

    Viime vuosina syvÀoppimisen saralla tapahtunut kehitys on mahdollistanut neuroverkkoihin perustuvan puhesynteesin, joka lÀhes luonnollisen puheen tuottamisen lisÀksi sallii syntetisoidun puheen akustisten ominaisuuksien hallinnan. TÀmÀ merkitsee sitÀ, ettÀ on mahdollista tuottaa eloisaa puhetta eri tyyleillÀ, jotka sopivat kyseiseen kontekstiin. Yksi tapa, jolla tÀmÀ voidaan saavuttaa, on lisÀtÀ syntetisaattoriin referenssi-enkooderi, joka toimii pullonkaulana mallintaen prosodiaan liittyvÀn latentin avaruuden. TÀmÀn tutkimuksen pÀÀmÀÀrÀnÀ oli analysoida kuinka referenssi-enkooderin latentti avaruus mallintaa moninaisia ja realistisia puhetyylejÀ, ja miten puheennosten akustiset ominaisuudet ja niiden latentin avaruuden representaatiot korreloivat keskenÀÀn. Toinen pÀÀmÀÀrÀ oli arvioida kuinka syntetisoidun puheen tyyliÀ voi kontrolloida. Tutkimuksessa kÀytettiin referenssi-enkooderilla varustettua Tacotron 2 syntetisaattoria, joka oli koulutettu yhden naispuhujan luetulla puheella usealla puhetyylillÀ. Latenttia avaruutta analysoitiin tekemÀllÀ pÀÀkomponenttianalyysi puhedatan kaikista puheennoksista otetuille referenssivektoreille, jotta saataisiin esille puhetyylien keskeisimmÀt erot. Olettaen puhetyyleillÀ olevan akustisia korrelaatteja, tutkittiin pÀÀkomponenttien ja mitattujen akustisten ominaisuuksien vÀlillÀ olevaa mahdollista yhteyttÀ. Syntetisoitua puhetta analysoitiin kahdella tapaa: objektiivisella evaluaatiolla, joka arvioi akustisia ominaisuuksia ja subjektiivisella evaluaatiolla, joka arvioi syntetisoidun puheen sopivuutta liittyen puhuttuun lauseeseen. Tulokset osoittivat, ettÀ referenssienkooderi mallinsi tyylillisiÀ eroja hyvin, mutta tyylit olivat monisyisiÀ ja niissÀ oli merkittÀvÀÀ sisÀistÀ vaihtelua. PÀÀkomponenttianalyysi erotteli akustiset piirteet jossain mÀÀrin, ja tilastollinen analyysi osoitti yhteyden latentin avaruuden ja prosodisten ominaisuuksien vÀlillÀ. Objektiivinen evaluaatio antoi ymmÀrtÀÀ, ettÀ syntetisaattori ei tuottanut tyylien kaikkia akustisia ominaisuuksia, mutta subjektiivinen evaluaatio nÀytti, ettÀ mallinnus riitti vaikuttamaan sopivuuteen liittyviin arvioihin. Toisin sanoen spontaanilla tyylillÀ syntetisoitua puhetta pidettiin formaalia sopivampana spontaaniin tekstityyliin ja pÀinvastoin.In recent years, advances in deep learning have made it possible to develop neural speech synthesizers that not only generate near natural speech but also enable us to control its acoustic features. This means it is possible to synthesize expressive speech with different speaking styles that fit a given context. One way to achieve this control is by adding a reference encoder on the synthesizer that works as a bottleneck modeling a prosody related latent space. The aim of this study was to analyze how the latent space of a reference encoder models diverse and realistic speaking styles, and what correlation there is between the phonetic features of encoded utterances and their latent space representations. Another aim was to analyze how the synthesizer output could be controlled in terms of speaking styles. The model used in the study was a Tacotron 2 speech synthesizer with a reference encoder that was trained with read speech uttered in various styles by one female speaker. The latent space was analyzed with principal component analysis on the reference encoder outputs for all of the utterances in order to extract salient features that differentiate the styles. Basing on the assumption that there are acoustic correlates to speaking styles, a possible connection between the principal components and measured acoustic features of the encoded utterances was investigated. For the synthesizer output, two evaluations were conducted: an objective evaluation assessing acoustic features and a subjective evaluation assessing appropriateness of synthesized speech in regard to the uttered sentence. The results showed that the reference encoder modeled stylistic differences well, but the styles were complex with major internal variation within the styles. The principal component analysis disentangled the acoustic features somewhat and a statistical analysis showed a correlation between the latent space and prosodic features. The objective evaluation suggested that the synthesizer did not produce all of the acoustic features of the styles, but the subjective evaluation showed that it did enough to affect judgments of appropriateness, i.e., speech synthesized in an informal style was deemed more appropriate than formal style for informal style sentences and vice versa

    Euskal fonetika akustikoaren 100 urte

    Lan honetan euskal fonetika akustikoan (alderdi segmentalera mugatuta) orain arte egin diren lanen datu-base bat osatu da, ahalik eta lan gehien biltzeko helburuarekin. Azterketa zenbait aldagairen arabera egin da: argitalpen-urtea, gai orokorra (igurzkariak, herskariak, bokalak, etab.), aztertutako hizkera, hiztunen eta grabazioan erabilitako materialen ezaugarriak, hiztun- eta datu-kopurua, eta analisi-mota (analisi estatistikoa egin den). Guztira 97 lan bildu dira: zaharrenak 1923koak dira eta berrienak 2023koak. Lanak, beraz, azken 100 urteetan arloan egindakoari argazki orokorra ateratzen dio, orain arte egindakoak eta egiteke daudenak hobeto ikusteko