Search CORE

13 research outputs found

INSPECT: Innovating Speech Elicitation Techniques

Author: Niebuhr Oliver
Publication venue
Publication date
Field of study

University of Southern Denmark Research Output

Sociophonetic perspectives on stylistic diversity in speech research

Author: Boyd Zachary
Hall-Lew Lauren
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 29/01/2020
Field of study

Edinburgh Research Explorer

Broadcasting Your Variety: Namibian English(es) on YouTube

Author: Zähres Frederic
Publication venue
Publication date: 01/01/2019
Field of study

Zähres F. Broadcasting Your Variety: Namibian English(es) on YouTube. Presented at the IAWE 2019 - 24th Conference of the International Association of World Englishes, Limerick, Ireland.Submitted Abstract: "English, despite its limited history within the country, gained significant ground in Namibia over the last 30 years, which has become evident through recent quantitative and qualitative research on English in Namibia (cf. e.g. Buschfeld & Kautzsch 2014; Kautzsch & Schröder 2016, Stell 2016). This research also suggests that English is moving from foreign to second language status with nativization being observable on several linguistic levels. Additionally, in Namibia’s urban center, the former Afrikaans-dominated diglossic situation is progressively changing towards a “triglossic pattern dominated by English” (Stell 2016: 326). What could complement this traditional picture, and is missing thus far, is a digital perspective: Young Namibians use online social media services in all their facets, including content creation on the video-sharing platform YouTube. Since the audience of these channels transcends national and ethnic boundaries, this data could shed light on general questions of Namibian sociolinguistic standards and identities as well as details of the role and features of English in Namibia. The majority of Namibian YouTubers use English for their broadcasts from the periphery and almost exclusively produce natural videos, according to the typology proposed by Schneider (2016), which renders this digital content a valuable yet largely unexplored resource in the World Englishes context. The present paper aims to address this methodological gap by using a corpus consisting of five hours of YouTube data to scrutinize recent hypotheses on the status and features of Namibian English, focusing on phonological aspects such as splits in the lexical sets NURSE and KIT as well as mergers of the DRESS, TRAP, and NURSE vowels (cf. Kautzsch et al. 2017).

Publications at Bielefeld University

Multilingual markers of depression in remotely collected speech samples: A preliminary analysis

Author: Bailón Raquel
Bruce Stuart
Campbell Edward L
Carr Ewan
Conde Pauline
Cummins Nicholas
Dineley Judith
Dobson Richard JB
Folarin Amos A
Haro Josep Maria
Hotopf Matthew
Lamers Femke
Lavelle Grace
Leightley Daniel
Matcham Faith
Narayan Vaibhav A
Oetzmann Carolin
Penninx Brenda WJH
Ranjan Yatharth
Rashid Zulqarnain
Schuller Björn W
Siddi Sara
Simblett Sara
Stewart Callum
The RADAR-CNS Consortium
Vairavan Srinivasan
White Katie M
Wykes Til
Publication venue: 'Elsevier BV'
Publication date: 18/08/2023
Field of study

Background: Speech contains neuromuscular, physiological and cognitive components, and so is a potential biomarker of mental disorders. Previous studies indicate that speaking rate and pausing are associated with major depressive disorder (MDD). However, results are inconclusive as many studies are small and underpowered and do not include clinical samples. These studies have also been unilingual and use speech collected in controlled settings. If speech markers are to help understand the onset and progress of MDD, we need to uncover markers that are robust to language and establish the strength of associations in real-world data. // Methods: We collected speech data in 585 participants with a history of MDD in the United Kingdom, Spain, and Netherlands as part of the RADAR-MDD study. Participants recorded their speech via smartphones every two weeks for 18 months. Linear mixed models were used to estimate the strength of specific markers of depression from a set of 28 speech features. // Results: Increased depressive symptoms were associated with speech rate, articulation rate and intensity of speech elicited from a scripted task. These features had consistently stronger effect sizes than pauses. // Limitations: Our findings are derived at the cohort level so may have limited impact on identifying intra-individual speech changes associated with changes in symptom severity. The analysis of features averaged over the entire recording may have underestimated the importance of some features. // Conclusions: Participants with more severe depressive symptoms spoke more slowly and quietly. Our findings are from a real-world, multilingual, clinical dataset so represent a step-change in the usefulness of speech as a digital phenotype of MDD

UCL Discovery

Acoustics and discourse function of two types of breathing signals

Author: Abrahamsen Jardar Eggesbø
Cwiek Aleksandra
Heldner Mattias
Koreman Jacques
van Dommelen Wim A.
Wagner Petra
Wlodarczak Marcin
Publication venue: Peter Lang Publishing Group
Publication date: 01/01/2017
Field of study

Cwiek A, Wlodarczak M, Heldner M, Wagner P. Acoustics and discourse function of two types of breathing signals. In: Abrahamsen JE, Koreman J, van Dommelen WA, eds. Nordic Prosody: Proceedings of the XIIth Conference, Trondheim 2016. Frankfurt a.M.: Peter Lang Publishing Group; 2017: 83-91.Breathing is fundamental for living and speech, and it has been a subject of linguistic research for years. Recently, there has been a renewed interest in tackling the question of possible communicative functions of breathing (e.g. Rochet-Capellan & Fuchs, 2014; Aare, Włodarczak & Heldner, 2014; Włodarczak & Heldner, 2015; Włodarczak, Heldner, & Edlund, 2015). The present study set out to determine acoustic markedness and communicative functions of pauses accompanied and non-accompanied by breathing. We hypothesised that an articulatory reset occurring in breathing pauses and an articulatory freeze in non-breathing pauses differentiates between the two types. A production experiment was conducted and some evidence in favour of such a phenomenon was found. Namely, in case of non-breathing pauses, we observed more coarticulation evidenced by a more frequent omission of plosive releases. Our findings thus give some evidence in favour of the communicative function of breathing

Publications at Bielefeld University

Letter to the Editor: Towards open data policies in phonetics:What we can gain and how we can avoid pitfalls

Author: Garellek Marc
Gordon Matthew
Kirby James
Lee Wai-Sum
Michaud Alexis
Mooshammer Christina
Niebuhr Oliver
Recasens Daniel
Roettger Timo
Simpson Adrian
Yu Kristine
Publication venue
Publication date: 14/09/2020
Field of study

Edinburgh Research Explorer

Asymmetrinen Lombard-efekti – Yhtäaikainen keskustelu meluisassa ja hiljaisessa ympäristössä

Author: Wikström Alexandra
Publication venue: Helsingfors universitet
Publication date: 01/01/2022
Field of study

Ihmiset muuttavat äänentuotantoaan kuuluvammaksi meluisassa ympäristössä refleksinomaisesti. Tätä ilmiötä kutsutaan Lombard-efektiksi. Efekti saa puhujan tuottamaan Lombard-puhetta, jota on tutkittu jo yli vuosisadan ajan eri näkökulmista. Lombard-puheen akustiikalle ominaista ovat korotettu äänenpainetaso, korotettu puheäänen perustaajuus, muutokset äänen osataajuuksissa sekä muissa äänen spektrin rakenteissa. Lisäksi Lombard-puheessa vokaalien pituuksilla on tapana kasvaa, ja äärimmäisissä meluolosuhteissa hyperartikulaatiota voi esiintyä. Puhetilanteeseen sisältyvä kommunikatiivinen aspekti on keskeistä ilmiön synnylle. Tämän tutkielman tavoitteena oli tutkia puheentuottoa keskustelutilanteessa, jossa samanaikaisesti toinen keskustelijoista on altistettuna melulle ja tuottaa täten Lombard-puhetta, ja toinen keskustelija kommunikoi hiljaisuudessa ilman taustamelun suoria vaikutuksia, ja selvittää, onko puheen akustiikassa tai ymmärrettävyydessä eroavaisuuksia tällaisessa epäsymmetrisessä tilanteessa verrattuna symmetriseen puhetilanteeseen, jossa molempien puhujien ääniympäristö on sama. Tutkimusta varten kaksi paria suomenkielisiä keskustelijoita (yhteensä neljä osallistujaa, kaikki naisia) ratkoivat pareittain sudokupohjaisia tehtäviä kolmessa eri taustamelutilanteessa: (1) hiljaisuudessa, (2) molempien ollessa taustamelussa (symmetrinen), ja (3) vain toisen keskustelijan ollessa taustamelussa (asymmetrinen). Taustamelu, jota soitettiin koehenkilöille 75 dB äänenpainetasolla, oli laadultaan cocktail-melua, joka sisältää niin kutsuttua puheensorinaa jossa useampi puhuja puhuu päällekkäin. Keskustelut äänitettiin ja niistä kerättiin yhteensä 453 maalitavua, joista kaikista analysoitiin keskimääräinen äänenpainetaso, ja 417 maalitavusta analysoitiin keskimääräinen perustaajuus. Äänenpainetason ja perustaajuuden arvot normalisoitiin ja arvoille suoritettiin keskiarvoja ja variansseja vertailevat tilastolliset testit. Odotetusti kaikki puhujat korottivat äänenpainetasoaan ja perustaajuuttaan siirryttäessä hiljaisesta keskustelutilanteesta symmetriseen taustamelutilanteeseen, jossa molemmat keskustelukumppanit tuottivat Lombard-puhetta. Henkilöt, jotka asymmetrisessä keskustelutilanteessa olivat itse hiljaisuudessa ja kommunikoivat keskustelukumppanille, joka oli melussa, korottivat sekä äänenpainetasoaan että perustaajuuttaan asymmetrisessä keskustelutilanteessa verrattuna hiljaiseen keskustelutilanteeseen. Lisäksi toinen näistä puhujista korotti sekä äänenpainetasoaan että perustaajuuttaan lähes oman Lombard-puheensa tasolle, jota mitattiin symmetrisessä tilanteessa. Puhujat, jotka olivat altistettuna melulle asymmetrisessä tilanteessa, käyttivät keskimäärin matalampaa äänenpainetasoa asymmetrisessä kuin symmetrisessä tilanteessa, vaikka tuottivatkin Lombard-puhetta molemmissa tilanteissa. Väärin kuultuja maalitavuja ei havaittu asymmetrisessä tilanteessa, vaan henkilöt, jotka olivat kyseisessä tilanteessa hiljaisuudessa, onnistuivat korottamaan ääntään tarvittavalle tasolle, jotta ratkaiseva tieto saatiin kommunikoitua melussa olevalle henkilölle. Tämä tutkimus osoitti, että kahden keskustelukumppanin ääniympäristöjen ollessa eriävät, kumpikaan keskustelijoista ei tuota täysin sentyyppistä puhetta, joka olisi sopivaa heidän senhetkiseen ääniympäristöönsä, vaan puheentuottoon vaikuttaa myös välillisesti keskustelukumppanin ääniympäristö. Lisäksi tutkimus osoitti, että siinä missä puhetilanteen kommunikatiivisuus voi lisätä Lombard-efektin vaikutuksia, se voi myös häivyttää niitä. Jatkotutkimuksissa tulisi kerätä enemmän dataa ja suorittaa datalle laajempaa analyysiä.Humans increase their vocal efforts in a noisy environment in a reflex-like manner. This phenomenon is called the Lombard effect. The effect causes the speaker to produce Lombard speech, which has been researched for over a century from different standpoints. Lombard speech is characterized by increased mean energy intensity level, increased fundamental frequency, changes in the formant frequencies, and in other spectral qualities of the voice. In addition, vowel durations tend to increase and in extreme noise conditions, a speaker might hyperarticulate. The communicative aspect of a speech situation is essential to the emergence of the phenomenon. The goal of this thesis was to examine speech production in a conversational situation where simultaneously one of the interlocutors engaged in a conversation is subjected to noise and is thus producing Lombard speech, while the other interlocutor is communicating in silence without the direct effects of background noise, and to determine, whether there are differences in the acoustics or the intelligibility of speech in such an asymmetrical speech situation compared to a symmetrical situation where the noise environment of the interlocutors is the same. Two pairs of Finnish speakers (4 participants altogether, all female) were recorded doing sudoku-based tasks in three different background noise conditions: (1) in quiet, (2) with both interlocutors in noise (symmetrical), and (3) with only one of the interlocutors subjected to noise (asymmetrical). The background noise, played at 75 dB, was cocktail noise, which includes unintelligible speech from simultaneous speakers. Altogether 453 target syllables were collected, and the mean energy intensity level was extracted from each syllable. Mean fundamental frequency (f0) data was extracted from 417 target syllables. The values of f0 and intensity were normalized and statistical tests comparing means and variances were carried out on the data. Expectedly all participants increased their intensity level and f0 from the quiet to the symmetrical condition, where both interlocutors produced Lombard speech. The participants who during the asymmetrical condition were in silence and communicated to the interlocutor who was in noise increased both their intensity and f0 in the asymmetrical condition compared to the quiet condition. In addition, one of these participants increased both measures to nearly the levels that were measured from her Lombard speech in the symmetrical condition. The participants who were subjected to noise during the asymmetrical condition on average used lower intensity levels in the asymmetrical condition than in the symmetrical condition, even though they produced Lombard speech during both. No target syllables were misheard during the asymmetrical condition, rather, the participants who were in silence during said condition managed to increase their vocal efforts to a level that ensured the communication of crucial information to the person in noise. This experiment demonstrated that when the sound environments of two interlocutors are different, neither of the interlocutors produces speech that would be completely suitable for their respective environments but are indirectly affected by the sound environments of their conversational partners. In addition, it was shown that while communicativeness can increase the effects of the Lombard effect, it can also decrease them. For further research into the topic more data should be gathered, and wider analyses should be carried out

Helsingin yliopiston digitaalinen arkisto

Latentin prosodia-avaruuden analysointi ja puhetyylien hallinta suomenkielisessä end-to-end puhesynteesissä

Author: Törö Tuukka
Publication venue: Helsingfors universitet
Publication date: 01/01/2022
Field of study

Viime vuosina syväoppimisen saralla tapahtunut kehitys on mahdollistanut neuroverkkoihin perustuvan puhesynteesin, joka lähes luonnollisen puheen tuottamisen lisäksi sallii syntetisoidun puheen akustisten ominaisuuksien hallinnan. Tämä merkitsee sitä, että on mahdollista tuottaa eloisaa puhetta eri tyyleillä, jotka sopivat kyseiseen kontekstiin. Yksi tapa, jolla tämä voidaan saavuttaa, on lisätä syntetisaattoriin referenssi-enkooderi, joka toimii pullonkaulana mallintaen prosodiaan liittyvän latentin avaruuden. Tämän tutkimuksen päämääränä oli analysoida kuinka referenssi-enkooderin latentti avaruus mallintaa moninaisia ja realistisia puhetyylejä, ja miten puheennosten akustiset ominaisuudet ja niiden latentin avaruuden representaatiot korreloivat keskenään. Toinen päämäärä oli arvioida kuinka syntetisoidun puheen tyyliä voi kontrolloida. Tutkimuksessa käytettiin referenssi-enkooderilla varustettua Tacotron 2 syntetisaattoria, joka oli koulutettu yhden naispuhujan luetulla puheella usealla puhetyylillä. Latenttia avaruutta analysoitiin tekemällä pääkomponenttianalyysi puhedatan kaikista puheennoksista otetuille referenssivektoreille, jotta saataisiin esille puhetyylien keskeisimmät erot. Olettaen puhetyyleillä olevan akustisia korrelaatteja, tutkittiin pääkomponenttien ja mitattujen akustisten ominaisuuksien välillä olevaa mahdollista yhteyttä. Syntetisoitua puhetta analysoitiin kahdella tapaa: objektiivisella evaluaatiolla, joka arvioi akustisia ominaisuuksia ja subjektiivisella evaluaatiolla, joka arvioi syntetisoidun puheen sopivuutta liittyen puhuttuun lauseeseen. Tulokset osoittivat, että referenssienkooderi mallinsi tyylillisiä eroja hyvin, mutta tyylit olivat monisyisiä ja niissä oli merkittävää sisäistä vaihtelua. Pääkomponenttianalyysi erotteli akustiset piirteet jossain määrin, ja tilastollinen analyysi osoitti yhteyden latentin avaruuden ja prosodisten ominaisuuksien välillä. Objektiivinen evaluaatio antoi ymmärtää, että syntetisaattori ei tuottanut tyylien kaikkia akustisia ominaisuuksia, mutta subjektiivinen evaluaatio näytti, että mallinnus riitti vaikuttamaan sopivuuteen liittyviin arvioihin. Toisin sanoen spontaanilla tyylillä syntetisoitua puhetta pidettiin formaalia sopivampana spontaaniin tekstityyliin ja päinvastoin.In recent years, advances in deep learning have made it possible to develop neural speech synthesizers that not only generate near natural speech but also enable us to control its acoustic features. This means it is possible to synthesize expressive speech with different speaking styles that fit a given context. One way to achieve this control is by adding a reference encoder on the synthesizer that works as a bottleneck modeling a prosody related latent space. The aim of this study was to analyze how the latent space of a reference encoder models diverse and realistic speaking styles, and what correlation there is between the phonetic features of encoded utterances and their latent space representations. Another aim was to analyze how the synthesizer output could be controlled in terms of speaking styles. The model used in the study was a Tacotron 2 speech synthesizer with a reference encoder that was trained with read speech uttered in various styles by one female speaker. The latent space was analyzed with principal component analysis on the reference encoder outputs for all of the utterances in order to extract salient features that differentiate the styles. Basing on the assumption that there are acoustic correlates to speaking styles, a possible connection between the principal components and measured acoustic features of the encoded utterances was investigated. For the synthesizer output, two evaluations were conducted: an objective evaluation assessing acoustic features and a subjective evaluation assessing appropriateness of synthesized speech in regard to the uttered sentence. The results showed that the reference encoder modeled stylistic differences well, but the styles were complex with major internal variation within the styles. The principal component analysis disentangled the acoustic features somewhat and a statistical analysis showed a correlation between the latent space and prosodic features. The objective evaluation suggested that the synthesizer did not produce all of the acoustic features of the styles, but the subjective evaluation showed that it did enough to affect judgments of appropriateness, i.e., speech synthesized in an informal style was deemed more appropriate than formal style for informal style sentences and vice versa

Helsingin yliopiston digitaalinen arkisto

Euskal fonetika akustikoaren 100 urte

Author: Dorota Krajewska
Publication venue: UPV/EHU Press
Publication date: 01/01/2024
Field of study

Lan honetan euskal fonetika akustikoan (alderdi segmentalera mugatuta) orain arte egin diren lanen datu-base bat osatu da, ahalik eta lan gehien biltzeko helburuarekin. Azterketa zenbait aldagairen arabera egin da: argitalpen-urtea, gai orokorra (igurzkariak, herskariak, bokalak, etab.), aztertutako hizkera, hiztunen eta grabazioan erabilitako materialen ezaugarriak, hiztun- eta datu-kopurua, eta analisi-mota (analisi estatistikoa egin den). Guztira 97 lan bildu dira: zaharrenak 1923koak dira eta berrienak 2023koak. Lanak, beraz, azken 100 urteetan arloan egindakoari argazki orokorra ateratzen dio, orain arte egindakoak eta egiteke daudenak hobeto ikusteko

Directory of Open Access Journals

Per un approccio multidimensionale allo studio dell’intonazione: le domande in genovese

Author: Cangemi Francesco
Dipino Dalila
Garassino Davide
Publication venue: Officinaventuno
Publication date: 31/12/2021
Field of study

ZORA