3,077 research outputs found

    Prosodic Phrasing in Spontaneous Swedish

    Get PDF
    One of the most important functions of prosody is to divide the flow of speech into chunks. The chunking, or prosodic phrasing, of speech plays an important role in both the production and perception of speech. This study represents a move away from the laboratory speech examined in previous, related studies on prosodic phrasing in Swedish, since a spontaneous, Southern Swedish speech material is investigated. The study is, however, not primarily intended as a study of the Southern Swedish dialect; rather Southern Swedish is used as a convenient object on which to test various hypotheses about the phrasing function of prosody in spontaneous speech. The study comprises both analyses of production data and perception experiments, and both the phonetics and phonology of prosodic phrasing is dealt with. First, the distribution of prosodic phrase boundaries in spontaneous speech is examined by considering it as a reflection of optimality theoretic constraints that restrain the production and perception of speech. Secondly, the phonetic realization of prosodic phrase boundaries is investigated in a study on articulation rate changes within the prosodic phrase. Evidence of phrase-final lengthening, a reduction of the articulation rate in the final part of the prosodic phrase, is found. The tonal means used to signal coherence within the prosodic phrase is subsequently investigated. An attempt is made to test the two Lund intonation models’ capacities for describing spontaneous speech. The two approaches have different implications for the amount of preplanning needed, which makes them particularly interesting to compare by testing spontaneous data. The results indicate that no or little preplanning is needed to produce tonally coherent phrases. No evidence is found to suggest e.g. that speakers accommodate for the length of the upcoming phrase by starting longer phrases with a higher F0 than short phrases. An explanation is sought for variation in F0 starting points found in the data despite F0’s insensitivity to phrase length. It is concluded that F0 is used to signal coherence even across prosodic phrase boundaries. It is furthermore found that tonal coherence signals are used to override strong boundary signals in spontaneous speech, thereby making initially unplanned additions possible. Finally, the perception of boundary strength is examined in two perception experiments. Listeners are found to agree well in their perceptual judgments of boundary strength, and it is shown that the main correlate to perceived boundary strength in spontaneous speech is pause length. The useful distinction between weak, prosodic phrase boundaries and strong, prosodic utterance boundaries in descriptions of read speech is found to be inappropriate for descriptions of spontaneous speech. It fails to capture the conflicting local and global signals of boundary strength and coherence that arise when strong boundary signals are overriden by coherence signals. The possibility to use conflicting signals in this way is seen as an important asset to the speaker as it makes changes in the speech plan possible, and it is regarded to be a characteristic of prosodic phrasing in spontaneous speech

    Fluency-related Temporal Features and Syllable Prominence as Prosodic Proficiency Predictors for Learners of English with Different Language Backgrounds

    Get PDF
    Prosodic features are important in achieving intelligibility, comprehensibility, and fluency in a second or foreign language (L2). However, research on the assessment of prosody as part of oral proficiency remains scarce. Moreover, the acoustic analysis of L2 prosody has often focused on fluency-related temporal measures, neglecting language-dependent stress features that can be quantified in terms of syllable prominence. Introducing the evaluation of prominence-related measures can be of use in developing both teaching and assessment of L2 speaking skills. In this study we compare temporal measures and syllable prominence estimates as predictors of prosodic proficiency in non-native speakers of English with respect to the speaker's native language (L1). The predictive power of temporal and prominence measures was evaluated for utterance-sized samples produced by language learners from four different L1 backgrounds: Czech, Slovak, Polish, and Hungarian. Firstly, the speech samples were assessed using the revised Common European Framework of Reference scale for prosodic features. The assessed speech samples were then analyzed to derive articulation rate and three fluency measures. Syllable-level prominence was estimated by a continuous wavelet transform analysis using combinations of F0, energy, and syllable duration. The results show that the temporal measures serve as reliable predictors of prosodic proficiency in the L2, with prominence measures providing a small but significant improvement to prosodic proficiency predictions. The predictive power of the individual measures varies both quantitatively and qualitatively depending on the L1 of the speaker. We conclude that the possible effects of the speaker's L1 on the production of L2 prosody in terms of temporal features as well as syllable prominence deserve more attention in applied research and developing teaching and assessment methods for spoken L2.Peer reviewe

    The prosody underlying spoken language proficiency : Cross-lingual investigation of non-native fluency and syllable prominence

    Get PDF
    Prosodic structures are one of the most challenging features for second or foreign language (L2) speakers to learn. Since prosody is also crucial for speech intelligibility and fluency, the ability to quantify language learners' proficiency in terms of prosody can be of use not only to language teaching but also to the developers of language testing and assessment methods or tools. This doctoral dissertation explores non-native prosody with new multidisciplinary methods and cross-lingual research data. The focus is on investigating the relations between the assessment of prosodic proficiency and fluency-related temporal features as well as syllable-level prominence realizations. This dissertation presents three original publications (Studies I-III). In these studies, the relations of the selected prosodic features to human assessments are investigated from Finland Swedish as an L2 (produced by Finnish speaking students) and from L2 English produced by Czech, Slovak, Hungarian, and Polish speakers. Objective temporal fluency features are measured based on previous research on L2 speech fluency. In addition, a state-of-the-art method based on continuous wavelet transform (CWT) is used for estimating syllable prominence. All analyzed speech data were assessed using the Common European Framework of Reference (CEFR) scale for prosodic proficiency. The results of Study I and III indicate that articulation rate and certain types of disfluencies in speech can reliably predict the perceived prosodic proficiency level regardless of the language context. However, results from Study I reveal that assessors seem to weigh temporal features differently depending on the speech type (read vs. spontaneous) as well as their individual foci. Study II provides promising results on the use of CWT-based prominence estimation in predicting L2 proficiency. Correlations of prominence estimates for L2 utterances with estimates for native speakers' corresponding productions were used as a predictive measure, and the the level of agreement conceptualized this way correlated significantly with the human assessments of prosodic proficiency. In Study III, manually annotated temporal fluency measures were compared to CWT-based prominence estimates as predictors of prosodic proficiency. Temporal measures served as more reliable predictors of prosodic proficiency, but prominence measures provided a significant improvement to the prediction of prosodic proficiency. The predictive power of the individual measures varied both quantitatively and qualitatively with respect to the speaker's first language (L1). In conclusion, this dissertation supports the earlier observations on the role of temporal fluency measures, especially articulation rate, in estimating L2 speaker's oral proficiency. The CWT method, in turn, revealed differences in the productions of L2 prominence with regard to speaker's L1 and thus provided complementary information for the prediction of prosodic proficiency. The acoustic features underlying L2 stress production should therefore be further studied with respect to speaker's L1. Furthermore, the speech type as well as speaker's L1 should be acknowledged in developing robust and reliable automatic spoken language learning and assessment tools.TĂ€mĂ€ vĂ€itöskirja koostuu tutkimuksista, joissa selvitetÀÀn suullisen kielitaidon arvioinnin taustalla vaikuttavia puheen prosodisia piirteitĂ€. Aiemmissa tutkimuksissa on havaittu, ettĂ€ prosodia – puheen intonaatio, painotus ja rytmi – on kielenoppijoille yksi haastavimmista kielitaidon osa-alueista. Samalla prosodian hallinnan on todettu olevan hyvin olennaista puheen ymmĂ€rrettĂ€vyydelle ja sujuvuudelle. Prosodisten piirteiden tutkiminen kielenoppijoiden puheesta auttaa kehittĂ€mÀÀn paitsi suullisen kielitaidon opetusta myös automaattisia arviointimenetelmiĂ€. VĂ€itöskirja tuo uutta tietoa kielenoppijoiden prosodiasta monikielisen aineiston avulla sekĂ€ esittelee uuden, aallokemuunnoksiin pohjautuvan puheen analyysimenetelmĂ€n, jota ei ole aiemmin kĂ€ytetty kielenoppijan puheen tutkimisessa. Kolmessa osatutkimuksessa kielenoppijoiden puheesta analysoidaan sujuvuuteen liitettyjĂ€ temporaalisia piirteitĂ€, kuten artikulaationopeutta ja tauotusta. LisĂ€ksi analysoidaan sana- ja lausepainojen toteutumista aallokemuunnoksiin pohjautuvalla työkalulla. Akustisten parametrien yhteyksiĂ€ ihmisten tekemiin arvioihin tutkitaan logististen regressiomallien avulla kahdesta erikielisestĂ€ aineistosta: suomenkielisten puhumasta ruotsista (Tutkimukset I ja II) sekĂ€ tsekin-, slovakian-, puolan- ja unkarinkielisten puhumasta englannista (Tutkimus III). Tutkimusten I ja III tulokset vahvistavat temporaalisten sujuvuuspiirteiden kieliriippumatonta merkitystĂ€ suullisen kielitaidon objektiivisessa mittaamisessa. LisĂ€ksi Tutkimus I osoittaa, ettĂ€ eri piirteiden merkitys riippuu sekĂ€ arvioijien yksilöllisistĂ€ mieltymyksistĂ€ ettĂ€ siitĂ€, onko arvioitava puhe luettua vai spontaania. Tutkimus II puolestaan osoittaa, ettĂ€ aallokemuunnosten avulla mitattujen sana- ja lausepainojen toteutumilla voidaan ennustaa kielenoppijoiden prosodista taitotasoa. Tutkimuksessa III vertailtiin temporaalisten sujuvuuspiirteiden ja aallokemuunnoksella mitattujen sana- ja lausepainojen voimaa prosodisen taitotason ennustajina erikielisillĂ€ englanninoppijoilla. Tulokset osoittavat, ettĂ€ temporaaliset sujuvuuspiirteet ovat mitattuja sana- ja lausepainoja luotettavampia ennustamaan ihmisten antamia arvioita, mutta sana- ja lausepainojen huomioiminen parantaa tilastollisen mallin selitysvoimaa. LisĂ€ksi tulokset osoittavat, ettĂ€ oppijan Ă€idinkieli todennĂ€köisesti vaikuttaa siihen, mitĂ€ keinoja kielenoppija kĂ€yttÀÀ sana- ja lausepainojen tuottamiseen. Tutkimustulosten perusteella artikulaationopeus on tĂ€rkein yksittĂ€inen piirre kielenoppijan prosodisen taitotason arvioinnissa, ja tĂ€tĂ€ piirrettĂ€ voidaan kĂ€yttÀÀ myös suullisen kielitaidon automaattisessa arvioinnissa puhetyypistĂ€ ja kielikontekstista riippumatta. Sen sijaan tauotuksessa nĂ€yttÀÀ olevan erilaiset standardit luetussa ja spontaanissa puheessa. LisĂ€ksi Ă€idinkielen vaikutusta vieraan kielen painotusten tuottamiseen tulee tutkia entistĂ€ kattavammin, jotta tĂ€tĂ€ piirrettĂ€ voidaan luotettavasti kĂ€yttÀÀ kehittĂ€mÀÀn suullisen kielitaidon automaattista arviointia

    The Development of Speech Rhythm and Fluency of Advanced English Learners: A Mixed Methods Study of the Correlation between Native Speaker Evaluations and Acoustic Measures

    Get PDF
    This thesis examined the changes in speech rhythm and fluency of advanced English learners during a pronunciation course. The research strived to answer the following research questions: ‘Do speech rhythm and fluency of advanced English learners change after a pronunciation course according to native speaker ratings?’, ‘if yes, which acoustic measures do these changes correlate with?’, and ‘what is the correlation between the perceived speech rhythm, fluency, accentedness, and comprehensibility?’. I approached these issues through mixed methods, both quantitative and qualitative. First, 20 advanced Finnish learners of English were selected out of 45 first-year major English students available. The number of the participants was limited keeping in mind the duration of the native-speaker questionnaire. The questionnaire consisted of background information questions and 42 speech samples to be rated on a 9-point Likert scale, cropped from the learner recordings as well as one native speaker. After collecting responses from 31 native speakers of English, I conducted a statistical analysis, whose results were then used for extreme case sampling. The speech of four learners with the biggest changes in their speech rhythm and fluency was then analyzed acoustically to find the contributing factors. The results showed both positive and negative changes on an individual level, but the differences were not statistically significant on a group level. The acoustic analysis demonstrated higher fluency scores correlating with faster articulation rate, smaller number of unfilled pauses, the location of pauses at phrase or clause boundaries, and fewer repairs. Rhythm measures revealed that pitch and amplitude peaks generally matched better and the use of durational cues as well as vowel reduction and linking increased in the posttest speech. All four rated aspects correlated significantly, particularly speech rhythm and fluency scores in both pre- and posttest samples (r = 0.98). Thus, these two can be said to be closely intertwined. Based on the results, speech rhythm should not be neglected in pronunciation instruction as it strongly influences the perceptions of fluency, accentedness, and comprehensibility. It is suggested that further research on rhythm focuses on its nature as both a perceived and produced phenomenon, as well as defining its relationship to fluency

    Immediate and Distracted Imitation in Second-Language Speech: Unreleased Plosives in English

    Get PDF
    The paper investigates immediate and distracted imitation in second-language speech using unreleased plosives. Unreleased plosives are fairly frequently found in English sequences of two stops. Polish, on the other hand, is characterised by a significant rate of releases in such sequences. This cross-linguistic difference served as material to look into how and to what extent non-native properties of sounds can be produced in immediate and distracted imitation. Thirteen native speakers of Polish first read and then imitated sequences of words with two stops straddling the word boundary. Stimuli for imitation had no release of the first stop. The results revealed that (1) a non-native feature such as the lack of the release burst can be imitated; (2) distracting imitation impedes imitative performance; (3) the type of a sequence interacts with the magnitude of an imitative effec

    Juncture prosody across languages: Similar production but dissimilar perception

    Get PDF
    How do speakers of languages with different intonation systems produce and perceive prosodic junctures in sentences with identical structural ambiguity? Native speakers of English and of Mandarin produced potentially ambiguous sentences with a prosodic juncture either earlier in the utterance (e.g., “He gave her # dog biscuits,” “他给ć„č#ç‹—é„ŒćčČ â€), or later (e.g., “He gave her dog # biscuits,” “他给ć„č狗 #é„ŒćčČ â€). These productiondata showed that prosodic disambiguation is realised very similarly in the two languages, despite some differences in the degree to which individual juncture cues (e.g., pausing) were favoured. In perception experiments with a new disambiguation task, requiring speeded responses to select the correct meaning for structurally ambiguous sentences, language differences in disambiguation response time appeared: Mandarin speakers correctly disambiguated sentences with earlier juncture faster than those with later juncture, while English speakers showed the reverse. Mandarin-speakers with L2 English did not show their native-language response time pattern when they heard the English ambiguous sentences. Thus even with identical structural ambiguity and identically cued production, prosodic juncture perception across languages can differ

    Juncture prosody across languages : similar production but dissimilar perception

    Get PDF
    How do speakers of languages with different intonation systems produce and perceive prosodic junctures in sentences with identical structural ambiguity? Native speakers of English and of Mandarin produced potentially ambiguous sentences with a prosodic juncture either earlier in the utterance (e.g., “He gave her # dog biscuits,” “他给ć„č # ç‹—é„ŒćčČ”), or later (e.g., “He gave her dog # biscuits,” “他给ć„č狗 # é„ŒćčČ”). These production data showed that prosodic disambiguation is realized very similarly in the two languages, despite some differences in the degree to which individual juncture cues (e.g., pausing) were favoured. In perception experiments with a new disambiguation task, requiring speeded responses to select the correct meaning for structurally ambiguous sentences, language differences in disambiguation response time appeared: Mandarin speakers correctly disambiguated sentences with earlier juncture faster than those with later juncture, while English speakers showed the reverse. Mandarin speakers also showed higher levels of accuracy in disambiguation compared to English speakers, indicating language-specific differences in the extent to which prosodic cues are used. However, Mandarin, but not English, speakers showed a decrease in accuracy when pausing cues were removed. Thus even with high similarity in both structural ambiguity and production cues, prosodic juncture perception across languages can differ

    The phonetics of speech breathing : pauses, physiology, acoustics, and perception

    Get PDF
    Speech is made up of a continuous stream of speech sounds that is interrupted by pauses and breathing. As phoneticians are primarily interested in describing the segments of the speech stream, pauses and breathing are often neglected in phonetic studies, even though they are vital for speech. The present work adds to a more detailed view of both pausing and speech breathing with a special focus on the latter and the resulting breath noises, investigating their acoustic, physiological, and perceptual aspects. We present an overview of how a selection of corpora annotate pauses and pause-internal particles, as well as a recording setup that can be used for further studies on speech breathing. For pauses, this work emphasized their optionality and variability under different tempos, as well as the temporal composition of silence and breath noise in breath pauses. For breath noises, we first focused on acoustic and physiological characteristics: We explored alignment between the onsets and offsets of audible breath noises with the start and end of expansion of both rib cage and abdomen. Further, we found similarities between speech breath noises and aspiration phases of /k/, as well as that breath noises may be produced with a more open and slightly more front place of articulation than realizations of schwa. We found positive correlations between acoustic and physiological parameters, suggesting that when speakers inhale faster, the resulting breath noises were more intense and produced more anterior in the mouth. Inspecting the entire spectrum of speech breath noises, we showed relatively flat spectra and several weak peaks. These peaks largely overlapped with resonances reported for inhalations produced with a central vocal tract configuration. We used 3D-printed vocal tract models representing four vowels and four fricatives to simulate in- and exhalations by reversing airflow direction. We found the direction to not have a general effect for all models, but only for those with high-tongue configurations, as opposed to those that were more open. Then, we compared inhalations produced with the schwa-model to human inhalations in an attempt to approach the vocal tract configuration in speech breathing. There were some similarities, however, several complexities of human speech breathing not captured in the models complicated comparisons. In two perception studies, we investigated how much information listeners could auditorily extract from breath noises. First, we tested categorizing different breath noises into six different types, based on airflow direction and airway usage, e.g. oral inhalation. Around two thirds of all answers were correct. Second, we investigated how well breath noises could be used to discriminate between speakers and to extract coarse information on speaker characteristics, such as age (old/young) and sex (female/male). We found that listeners were able to distinguish between two breath noises coming from the same or different speakers in around two thirds of all cases. Hearing one breath noise, classification of sex was successful in around 64%, while for age it was 50%, suggesting that sex was more perceivable than age in breath noises.Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 418659027: "Pause-internal phonetic particles in speech communication

    Discourse structure and information structure : interfaces and prosodic realization

    Get PDF
    In this paper we review the current state of research on the issue of discourse structure (DS) / information structure (IS) interface. This field has received a lot of attention from discourse semanticists and pragmatists, and has made substantial progress in recent years. In this paper we summarize the relevant studies. In addition, we look at the issue of DS/ISinteraction at a different level—that of phonetics. It is known that both information structure and discourse structure can be realized prosodically, but the issue of phonetic interaction between the prosodic devices they employ has hardly ever been discussed in this context. We think that a proper consideration of this aspect of DS/IS-interaction would enrich our understanding of the phenomenon, and hence we formulate some related research-programmatic positions
    • 

    corecore