Search CORE

210 research outputs found

Fundamental frequency height as a resource for the management of overlap in talk-in-interaction.

Author: Brown G. J.
Kurtic E.
Wells B.
Publication venue: Emerald Group Publishing Limited
Publication date: 01/01/2009
Field of study

Overlapping talk is common in talk-in-interaction. Much of the previous research on this topic agrees that speaker overlaps can be either turn competitive or noncompetitive. An investigation of the differences in prosodic design between these two classes of overlaps can offer insight into how speakers use and orient to prosody as a resource for turn competition. In this paper, we investigate the role of fundamental frequency (F0) as a resource for turn competition in overlapping speech. Our methodological approach combines detailed conversation analysis of overlap instances with acoustic measurements of F0 in the overlapping sequence and in its local context. The analyses are based on a collection of overlap instances drawn from the ICSI Meeting corpus. We found that overlappers mark an overlapping incoming as competitive by raising F0 above their norm for turn beginnings, and retaining this higher F0 until the point of overlap resolution. Overlappees may respond to these competitive incomings by returning competition, in which case they raise their F0 too. Our results thus provide instrumental support for earlier claims made on impressionistic evidence, namely that participants in talk-in-interaction systematically manipulate F0 height when competing for the turn

White Rose Research Online

INSPECT: Innovating Speech Elicitation Techniques

Author: Niebuhr Oliver
Publication venue
Publication date
Field of study

University of Southern Denmark Research Output

Re-enacted and Spontaneous Conversational Prosody — How Different?

Author: Wagner Petra
Windmann Andreas
Publication venue
Publication date: 01/01/2016
Field of study

Wagner P, Windmann A. Re-enacted and Spontaneous Conversational Prosody — How Different? In: Proceedings of Speech Prosody 2016. Boston; 2016

Crossref

Publications at Bielefeld University

Speech data acquisition: the underestimated challenge

Author: Michaud Alexis
Niebuhr Oliver
Publication venue: Institut für Skandinavistik, Frisistik und Allgemeine Sprachwissenschaft (ISFAS) Abt. Allgemeine Sprachwissenschaft Christian - Albrechts - Universität zu Kiel
Publication date: 04/02/2015
Field of study

(This version makes 1 correction to the references: BARBOSA 2012 was cited in the text but missing from the list of references.)International audienceThe second half of the 20th century was the dawn of information technology; and we now live in the digital age. Experimental studies of prosody develop at a fast pace, in the context of an "explosion of evidence" (Janet Pierrehumbert, Speech Prosody 2010, Chicago). The ease with which anyone can now do recordings should not veil the complexity of the data collection process, however. This article aims at sensitizing students and scientists from the various fields of speech and language research to the fact that speech-data acquisition is an underestimated challenge. Eliciting data that reflect the communicative processes at play in language requires special precautions in devising experimental procedures and a fundamental understanding of both ends of the elicitation process: speaker and recording facilities. The article compiles basic information on each of these requirements and recapitulates some pieces of practical advice, drawing many examples from prosody studies, a field where the thoughtful conception of experimental protocols is especially crucial

Hal - Université Grenoble Alpes

Kysyvän funktion vaikutus spontaanin ja luetun suomen intonaatioon

Author: Anttila Hanna
Publication venue: Helsingin yliopisto
Publication date: 01/01/2008
Field of study

Goals This study aims to map the effect of interrogative function on the intonation of spontaneous and read Finnish. Earlier research shows that the most prominent feature in Finnish question intonation is an appeal to the listener. Question word questions typically start with a high peak which is followed by falling intonation. In yes/no questions, F0 remains on a high level until the word carrying sentence stress and then falls. Final rises are mainly found in intonation clichés such as "Ai mitä?" ("What?") These earlier results are based on read speech and enacted dialogues. In this study, questions and statements found in spontaneous dialogues were compared. These utterances were also compared with read versions of the same utterances. Fundamental frequency values were compared using a mixed model. Contours were also grouped using auditory and visual inspection. Thus it was possible to compare frequencies of contour types according to utterance type and speech style. The position of questions in the F0 distribution of the whole material was also investigated in this study. Method he material consisted of four spontaneous dialogues and their read versions. The speakers were young adults from the Helsinki metropolitan area, four females and four males. The whole material was first divided into broad dialogue function categories arising from the material and F0 curves were calculated for each category. After this, 277 questions and 244 statements were selected for closer inspection. Values reflecting F0 distribution and contour shape were measured from the F0 contours of these utterances. A mixed model was used to analyse the differences. Utterance type, question type, speech style and speaker gender were used as fixed effects. The frequencies of F0 contour types were compared using a Chi square test. Additional material in this study came from eight young female speakers in central Finland. Results and conclusions In the mixed model analysis, significant differences were found both between questions and statements and between spontaneous and read speech. Generally, utterance type affected the variables reflecting contour type while speech style affected the variables reflecting F0 distribution. The effect of question type was not clearly visible. In read speech the contours resembled earlier results more closely. Speakers had different strategies in differentiating between questions and statements. In the whole material, F0 was slightly higher in questions than in statements. The effect of dialectal background could be seen in the contour types. The results show that interrogative function affects intonation in both spontaneous and read Finnish.Tavoitteet Tutkimuksen tarkoituksena on selvittää, miten kysyvä funktio vaikuttaa spontaanin ja luetun suomen intonaatioon. Aiemmat tutkimukset osoittavat, että suomen kysymysintonaatiossa voimakkaimmin ilmenevä piirre on vetoomus kuulijaan. Kysymyssanakysymyksille on tyypillistä alun korkea huippu, jonka jälkeen perustaajuus laskee. Tästä poiketen kO-kysymyksissä perustaajuus säilyy korkealla lausepainolliseen sanaan saakka ja laskee vasta sen jälkeen. Nouseva loppu esiintyy lähinnä kiteytyneissä ilmauksissa kuten "Ai mitä?" Aiemmat tulokset perustuvat lukupuhuntaan ja näyteltyihin dialogeihin. Tutkimuksessa verrattiin spontaanipuheesta löytyviä kysymyksiä ja väitteitä keskenään. Toisena vertailukohtana olivat tutkittavat lauseet lukupuhuntana. Lauseista mitattuja perustaajuusarvoja verrattiin tilastollisen monitasomallin avulla. Lisäksi kontuurit tyypiteltiin auditiivisen ja visuaalisen havainnon perusteella. Tämä mahdollisti kontuurityyppien frekvenssien vertailun lausetyypin ja puhetyylin mukaan. Tutkimuksessa tarkasteltiin myös kysymysten asemaa koko aineiston perustaajuusjakaumassa. Menetelmät Tutkimusaineisto koostui neljästä dialogista sekä litteroitujen vuorosanojen luetuista toisinnoista. Puhujat olivat nuoria aikuisia pääkaupunkiseudulta. Kumpaakin sukupuolta edusti neljä puhujaa. Ensin koko aineisto jaettiin väljiin aineistolähtöisiin dialogifunktioluokkiin, joiden perustaajuuskäyrät laskettiin kokonaisuudessaan. Tämän jälkeen rajattiin 277 kysymystä ja 244 väitettä tarkempaa tutkimusta varten. Ilmauksista laskettiin perustaajuuskäyrät, joista mitattiin jakaumaa ja muotoa kuvastavia tunnuslukuja. Tilastollisen monitasomallin avulla etsittiin selittäviä tekijöitä näissä mittaustuloksissa esiintyviin eroihin. Selittäjinä käytettiin lause- ja kysymystyyppiä, puhetyyliä ja puhujan sukupuolta. Kontuurityyppien esiintymistä vertailtiin Khin neliötestin avulla. Täydentävänä aineistona oli lukupuhuntaa kahdeksalta keskisuomalaiselta naispuhujalta. Tulokset ja johtopäätökset Monitasomallinnuksessa merkitseviä eroja löytyi sekä kysymysten ja väitteiden välillä että spontaanipuheen ja lukupuhunnan välillä. Lausetyypillä oli vaikutusta erityisesti kontuurin muotoon ja puhetyylillä taas perustaajuusjakaumaan. Kysymystyypin vaikutus ei tämän kokoisessa aineistossa näkynyt selvästi. Lukupuhunnassa kontuurit muistuttivat selvemmin aiempien tutkimusten tuloksia. Eri puhujilla oli erilaisia tapoja erottaa kysymykset väitteistä. Koko aineiston tasolla perustaajuus oli hieman korkeampi kysymyksissä kuin väitteissä. Murretaustan vaikutus näkyi kontuurityyppien erilaisena jakaumana keskisuomalaisilla puhujilla. Tulokset osoittavat, että kysyvä funktio vaikuttaa intonaatioon sekä spontaanissa että luetussa suomessa

Helsingin yliopiston digitaalinen arkisto

Modelling prosodic and dialogue information for automatic speech recognition

Author: Wright Helen Frances
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Author: Altun
Anton Batliner
Armstrong
Atal
Athanaselis
Batliner
Batliner
Batliner
Bellman
Bengio
Björn Schuller
Boersma
Cheveigne
Cowie
Cowie
Daubechies
Davis
de Gelder
de Gelder
Devillers
Devillers
Dino Seppi
Erickson
Eyben
Eysenck
Fehr
Ferguson
Fernandez
Fillenbaum
Fleiss
Frick
Fukunaga
Gigerenzer
Grimm
Harnad
Hermansky
Hess
Hyvärinen
Johnstone
Jolliffe
Kharat
Kim
Lee
Lee
Lizhong
Lovins
Makhoul
Martin
Matos
Morrison
Morrison
Morrison
Nasoz
Nickerson
Noll
Nwe
Nöth
Pachet
Pantic
Pernegger
Picard
Porter
Pudil
Rabiner
Rosch
Rozeboom
Russell
Sachs
Said
Salzberg
Sato
Scherer
Schröder
Shaver
Stefan Steidl
tenBosch
Vlasenko
Witten
Wolpert
Wu
Wöllmer
Zeng
Zeng
Zeng
Zwicker
Publication venue: 'Elsevier BV'
Publication date: 01/11/2011
Field of study

More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech-the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts. (C) 2011 Elsevier B.V. All rights reserved.Schuller B., Batliner A., Steidl S., Seppi D., ''Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge'', Speech communication, vol. 53, no. 9-10, pp. 1062-1087, November 2011.status: publishe

Lirias

OPUS Augsburg

Crossref

Spiral - Imperial College Digital Repository

Latentin prosodia-avaruuden analysointi ja puhetyylien hallinta suomenkielisessä end-to-end puhesynteesissä

Author: Törö Tuukka
Publication venue: Helsingfors universitet
Publication date: 01/01/2022
Field of study

Viime vuosina syväoppimisen saralla tapahtunut kehitys on mahdollistanut neuroverkkoihin perustuvan puhesynteesin, joka lähes luonnollisen puheen tuottamisen lisäksi sallii syntetisoidun puheen akustisten ominaisuuksien hallinnan. Tämä merkitsee sitä, että on mahdollista tuottaa eloisaa puhetta eri tyyleillä, jotka sopivat kyseiseen kontekstiin. Yksi tapa, jolla tämä voidaan saavuttaa, on lisätä syntetisaattoriin referenssi-enkooderi, joka toimii pullonkaulana mallintaen prosodiaan liittyvän latentin avaruuden. Tämän tutkimuksen päämääränä oli analysoida kuinka referenssi-enkooderin latentti avaruus mallintaa moninaisia ja realistisia puhetyylejä, ja miten puheennosten akustiset ominaisuudet ja niiden latentin avaruuden representaatiot korreloivat keskenään. Toinen päämäärä oli arvioida kuinka syntetisoidun puheen tyyliä voi kontrolloida. Tutkimuksessa käytettiin referenssi-enkooderilla varustettua Tacotron 2 syntetisaattoria, joka oli koulutettu yhden naispuhujan luetulla puheella usealla puhetyylillä. Latenttia avaruutta analysoitiin tekemällä pääkomponenttianalyysi puhedatan kaikista puheennoksista otetuille referenssivektoreille, jotta saataisiin esille puhetyylien keskeisimmät erot. Olettaen puhetyyleillä olevan akustisia korrelaatteja, tutkittiin pääkomponenttien ja mitattujen akustisten ominaisuuksien välillä olevaa mahdollista yhteyttä. Syntetisoitua puhetta analysoitiin kahdella tapaa: objektiivisella evaluaatiolla, joka arvioi akustisia ominaisuuksia ja subjektiivisella evaluaatiolla, joka arvioi syntetisoidun puheen sopivuutta liittyen puhuttuun lauseeseen. Tulokset osoittivat, että referenssienkooderi mallinsi tyylillisiä eroja hyvin, mutta tyylit olivat monisyisiä ja niissä oli merkittävää sisäistä vaihtelua. Pääkomponenttianalyysi erotteli akustiset piirteet jossain määrin, ja tilastollinen analyysi osoitti yhteyden latentin avaruuden ja prosodisten ominaisuuksien välillä. Objektiivinen evaluaatio antoi ymmärtää, että syntetisaattori ei tuottanut tyylien kaikkia akustisia ominaisuuksia, mutta subjektiivinen evaluaatio näytti, että mallinnus riitti vaikuttamaan sopivuuteen liittyviin arvioihin. Toisin sanoen spontaanilla tyylillä syntetisoitua puhetta pidettiin formaalia sopivampana spontaaniin tekstityyliin ja päinvastoin.In recent years, advances in deep learning have made it possible to develop neural speech synthesizers that not only generate near natural speech but also enable us to control its acoustic features. This means it is possible to synthesize expressive speech with different speaking styles that fit a given context. One way to achieve this control is by adding a reference encoder on the synthesizer that works as a bottleneck modeling a prosody related latent space. The aim of this study was to analyze how the latent space of a reference encoder models diverse and realistic speaking styles, and what correlation there is between the phonetic features of encoded utterances and their latent space representations. Another aim was to analyze how the synthesizer output could be controlled in terms of speaking styles. The model used in the study was a Tacotron 2 speech synthesizer with a reference encoder that was trained with read speech uttered in various styles by one female speaker. The latent space was analyzed with principal component analysis on the reference encoder outputs for all of the utterances in order to extract salient features that differentiate the styles. Basing on the assumption that there are acoustic correlates to speaking styles, a possible connection between the principal components and measured acoustic features of the encoded utterances was investigated. For the synthesizer output, two evaluations were conducted: an objective evaluation assessing acoustic features and a subjective evaluation assessing appropriateness of synthesized speech in regard to the uttered sentence. The results showed that the reference encoder modeled stylistic differences well, but the styles were complex with major internal variation within the styles. The principal component analysis disentangled the acoustic features somewhat and a statistical analysis showed a correlation between the latent space and prosodic features. The objective evaluation suggested that the synthesizer did not produce all of the acoustic features of the styles, but the subjective evaluation showed that it did enough to affect judgments of appropriateness, i.e., speech synthesized in an informal style was deemed more appropriate than formal style for informal style sentences and vice versa

Helsingin yliopiston digitaalinen arkisto

Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

Directory of Open Access Books (DOAB)

Adapting the use of attributes to the task environment in joint action: results and a model

Author: Bard Ellen
Guhe Markus
Publication venue
Publication date: 01/06/2008
Field of study

Edinburgh Research Explorer