1,519 research outputs found

    The analysis of breathing and rhythm in speech

    Get PDF
    Speech rhythm can be described as the temporal patterning by which speech events, such as vocalic onsets, occur. Despite efforts to quantify and model speech rhythm across languages, it remains a scientifically enigmatic aspect of prosody. For instance, one challenge lies in determining how to best quantify and analyse speech rhythm. Techniques range from manual phonetic annotation to the automatic extraction of acoustic features. It is currently unclear how closely these differing approaches correspond to one another. Moreover, the primary means of speech rhythm research has been the analysis of the acoustic signal only. Investigations of speech rhythm may instead benefit from a range of complementary measures, including physiological recordings, such as of respiratory effort. This thesis therefore combines acoustic recording with inductive plethysmography (breath belts) to capture temporal characteristics of speech and speech breathing rhythms. The first part examines the performance of existing phonetic and algorithmic techniques for acoustic prosodic analysis in a new corpus of rhythmically diverse English and Mandarin speech. The second part addresses the need for an automatic speech breathing annotation technique by developing a novel function that is robust to the noisy plethysmography typical of spontaneous, naturalistic speech production. These methods are then applied in the following section to the analysis of English speech and speech breathing in a second, larger corpus. Finally, behavioural experiments were conducted to investigate listeners' perception of speech breathing using a novel gap detection task. The thesis establishes the feasibility, as well as limits, of automatic methods in comparison to manual annotation. In the speech breathing corpus analysis, they help show that speakers maintain a normative, yet contextually adaptive breathing style during speech. The perception experiments in turn demonstrate that listeners are sensitive to the violation of these speech breathing norms, even if unconsciously so. The thesis concludes by underscoring breathing as a necessary, yet often overlooked, component in speech rhythm planning and production

    Pushing the envelope: Evaluating speech rhythm with different envelope extraction techniques

    Get PDF
    The amplitude of the speech signal varies over time, and the speech envelope is an attempt to characterise this variation in the form of an acoustic feature. Although tacitly assumed, the similarity between the speech envelope-derived time series and that of phonetic objects (e.g., vowels) remains empirically unestablished. The current paper, therefore, evaluates several speech envelope extraction techniques, such as the Hilbert transform, by comparing different acoustic landmarks (e.g., peaks in the speech envelope) with manual phonetic annotation in a naturalistic and diverse dataset. Joint speech tasks are also introduced to determine which acoustic landmarks are most closely coordinated when voices are aligned. Finally, the acoustic landmarks are evaluated as predictors for the temporal characterisation of speaking style using classification tasks. The landmark that performed most closely to annotated vowel onsets was peaks in the first derivative of a human audition-informed envelope, consistent with converging evidence from neural and behavioural data. However, differences also emerged based on language and speaking style. Overall, the results show that both the choice of speech envelope extraction technique and the form of speech under study affect how sensitive an engineered feature is at capturing aspects of speech rhythm, such as the timing of vowels

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Temporal entrainment in overlapping speech

    Get PDF
    Wlodarczak M. Temporal entrainment in overlapping speech. Bielefeld: Bielefeld University; 2014

    THE RELATIONSHIP BETWEEN ACOUSTIC FEATURES OF SECOND LANGUAGE SPEECH AND LISTENER EVALUATION OF SPEECH QUALITY

    Get PDF
    Second language (L2) speech is typically less fluent than native speech, and differs from it phonetically. While the speech of some L2 English speakers seems to be easily understood by native listeners despite the presence of a foreign accent, other L2 speech seems to be more demanding, such that listeners must expend considerable effort in order to understand it. One reason for this increased difficulty may simply be the speaker’s pronunciation accuracy or phonetic intelligibility. If a L2 speaker’s pronunciations of English sounds differ sufficiently from the sounds that native listeners expect, these differences may force native listeners to work much harder to understand the divergent speech patterns. However, L2 speakers also tend to differ from native ones in terms of fluency – the degree to which a speaker is able to produce appropriately structured phrases without unnecessary pauses, self-corrections or restarts. Previous studies have shown that measures of fluency are strongly predictive of listeners’ subjective ratings of the acceptability of L2 speech: Less fluent speech is consistently considered less acceptable (Ginther, Dimova, & Yang, 2010). However, since less fluent speakers tend also to have less accurate pronunciations, it is unclear whether or how these factors might interact to influence the amount of effort listeners exert to understand L2 speech, nor is it clear how listening effort might relate to perceived quality or acceptability of speech. In this dissertation, two experiments were designed to investigate these questions

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Max Planck Institute for Psycholinguistics: Annual report 1996

    No full text

    The Effect Of Input Modality On Pronunciation Accuracy Of English Language Learners

    Get PDF
    The issues relative to foreign accent continue to puzzle second language researchers, educators, and learners today. Although once thought to be at the root, maturational constraints have fallen short of definitively accounting for the myriad levels and rates of phonological attainment (Bialystok & Miller, 1999, p. 128). This study, a Posttest-only Control Group Design, examined how the pronunciation accuracy of adult, English language learners, as demonstrated by utterance length, was related to two input stimuli: auditory-only input and auditoryorthographic input. Utterance length and input modality were further examined with the added variables of native language, specifically Arabic and Spanish, and second language proficiency as defined by unofficial TOEFL Listening Comprehension and Reading Comprehension section scores. Results from independent t tests indicated a statistically significant difference in utterance length based on input modality (t(192) = -3.285. p = .001), while with the added variable of native language, factorial ANOVA results indicated no statistically significance difference for the population studied. In addition, multiple linear regression analyses examined input modality and second language proficiency as predictors of utterance length accuracy and revealed a statistically significant relationship (R 2 = .108, adjusted R 2 = .089, F(3, 144) = 5.805, p = .001), with 11% of the utterance length variance accounted for by these two factors predictors. Lastly, hierarchical regressions applied to two blocks of factors revealed statistical significance: (a) input modality/native language (R 2 = .069, adjusted R 2 = .048, F(2, 87) = 3.230, p = .044) and ListenComp (R 2 = .101, adjusted R 2 = .070, F(3, 86) = 3.232, p = .026), with ListenComp iv increasing the predictive power by 3%; (b) input modality/native language (R 2 = .069, adjusted R 2 = .048, F(2, 87) = 3.230, p = .044) and ReadComp (R 2 = .112, adjusted R 2 = .081, F(1, 86) = 3.629, p = .016), with ReadComp increasing the predictive power by 4%; and (c) input modality/native language (R 2 = .069, adjusted R 2 = .048, F(2, 87) = 3.230, p = .044) and ListenComp/ReadComp (R 2 = .114, adjusted R 2 = .072, F(2, 85) = 2.129, p = .035), with ListenComp/ReadComp increasing the predictive power by 4%. The implications of this research are that by considering issues relative to input modality and second language proficiency levels especially when teaching new vocabulary to adult second language learners, the potential for improved pronunciation accuracy is maximized. Furthermore, the heightened attention to the role of input modality as a cognitive factor on phonological output in second language teaching and learning may redirect the manner in which target language phonology is approached
    • …
    corecore