109 research outputs found

    Phonetic Variation and Self-Recorded Data

    Get PDF
    Self-recordings, when speakers record themselves without a researcher present, are attractive for potentially eliciting a wider range of styles than is obtained through interviews. To compare the stylistic differences between self-recorded speech and interview speech, we present an analysis of sibilant production among four speakers in both contexts. Our results show that the contrast between self-recordings and interviews can be a reliable predictor, with differences often surpassing those between interview speech and read speech. We suggest that self-recordings may be stylistically different enough from interviews to justify overcoming the practical challenges of their collection, integrating the self-recording into standard sociolinguistic methodologies, at least for studies of intraspeaker variation and the description of variable phenomena

    Creaky Voice as a Stylistic Feature of Young American Female Speech: An Intraspeaker Variation Study of Scarlett Johansson

    Get PDF
    This study examines the stylistic use of ā€˜creaky voiceā€™ in a single speaker: the American actress Scarlett Johansson. Recently, there has been a marked increase in both media and academic interest in creaky voice, with work by Yuasa (2010) and Wolk et al. (2011) confirming the prevalence of this feature among young American female speakers. Our study was directly motivated by the work of Barry Pennock-Speck (2005), who took a qualitative approach to analyzing the speech of three American actresses for stylistic modulation of their voice quality. The present study focuses on only one American actress (Johansson), who was chosen as she is an established, successful young American female (at time of research) and therefore was an appropriate subject to represent the social group we are discussing. Our materials included six of Johanssonā€™s films that were developed whilst she was between the ages 18ā€“24. This age range falls in line with previous work on creaky voice (Wolk et al. 2011) who defined their age bracket of study as 18ā€“25 years old. We contrasted American and British character roles and noted the level of creak present through both quantitative and qualitative analysis of six films: three in which she played an American and three in which she took on an English (UK) accent. Acoustic data evaluation involved coding for creak on syllabic nuclei and carrying out a statistical analysis to determine significant influences on the pattern we observed. Our qualitative analysis covers the following variables: character traits and personality, time period in which the film is set, and the age of Johanssonā€™s character. Results showed that there was significantly more creak in Johanssonā€™s speech while she was performing in an American role, in line with the study previously conducted by Pennock-Speck. Our qualitative findings suggest that creak is modulated at an additional level, indexing seductiveness and intimacy with the interlocutor

    Duration of voicing and silence periods of continuous speech in different acoustic environments

    Get PDF
    This work deals with the duration of voicing and silence periods of continuous speech in rooms with very different reverberation times (RTs). Measurements were conducted using the Ambulatory Phonation Monitoring (APM) 3200 (Kaypentax) and Voice-Care devices (developed at the Politecnico di Torino, Italy), both of which have a contact microphone placed on the base of the neck to detect skin vibrations during phonation. Six university professors and 22 university students made short laboratory monologs in which they explained something that they knew well to a listener 6m away. Seven students also described a map with the intention of correctly explaining directions to a listener who drew the path on a blank chart. Longer speech samples were made by primary school teachers in classrooms. A tendency to increase the voicing periods as the RT increased was on average observed for the university professors, the school teachers, and the university students who described a map. These students also showed longer silence periods than the students who made short monologues. The recognized trends concerned voice professionals or subjects who were highly motivated to make themselves understood in a perturbed speaking situation. Nonparametric statistical tests, which were applied to detect the differences in distributions of voicing and silence periods, have basically supported the findings

    Evaluation of glottal characteristics for speaker identification.

    Get PDF
    Based on the assumption that the physical characteristics of people's vocal apparatus cause their voices to have distinctive characteristics, this thesis reports on investigations into the use of the long-term average glottal response for speaker identification. The long-term average glottal response is a new feature that is obtained by overlaying successive vocal tract responses within an utterance. The way in which the long-term average glottal response varies with accent and gender is examined using a population of 352 American English speakers from eight different accent regions. Descriptors are defined that characterize the shape of the long-term average glottal response. Factor analysis of the descriptors of the long-term average glottal responses shows that the most important factor contains significant contributions from descriptors comprised of the coefficients of cubics fitted to the long-term average glottal response. Discriminant analysis demonstrates that the long-term average glottal response is potentially useful for classifying speakers according to their gender, but is not useful for distinguishing American accents. The identification accuracy of the long-term average glottal response is compared with that obtained from vocal tract features. Identification experiments are performed using a speaker database containing utterances from twenty speakers of the digits zero to nine. Vocal tract features, which consist of cepstral coefficients, partial correlation coefficients and linear prediction coefficients, are shown to be more accurate than the long-term average glottal response. Despite analysis of the training data indicating that the long-term average glottal response was uncorrelated with the vocal tract features, various feature combinations gave insignificant improvements in identification accuracy. The effect of noise and distortion on speaker identification is examined for each of the features. It is found that the identification performance of the long-term average glottal response is insensitive to noise compared with cepstral coefficients, partial correlation coefficients and the long-term average spectrum, but that it is highly sensitive to variations in the phase response of the speech transmission channel. Before reporting on the identification experiments, the thesis introduces speech production, speech models and background to the various features used in the experiments. Investigations into the long-term average glottal response demonstrate that it approximates the glottal pulse convolved with the long-term average impulse response, and this relationship is verified using synthetic speech. Furthermore, the spectrum of the long-term average glottal response extracted from pre-emphasized speech is shown to be similar to the long-term average spectrum of pre-emphasized speech, but computationally much simpler

    Trait evaluations of faces and voices: Comparing within- and between-person variability

    Get PDF
    Human faces and voices are rich sources of information that can vary in many different ways. Most of the literature on face/voice perception has focussed on understanding how people look and sound different to each other (between-person variability). However, recent studies highlight the ways in which the same person can look and sound different on different occasions (within-person variability). Across three experiments, we examined how within- and between-person variability relate to one another for social trait impressions by collecting trait ratings attributed to multiple face images and voice recordings of the same people. We find that within-person variability in social trait evaluations is at least as great as between-person variability. Using different stimulus sets across experiments, trait impressions of voices are consistently more variable within people than between people ā€“ a pattern that is only evident occasionally when judging faces. Our findings highlight the importance of understanding within-person variability, showing how judgements of the same person can vary widely on different encounters and quantify how this pattern differs for voice and face perception. The work consequently has implications for theoretical models proposing that voices can be considered ā€˜auditory facesā€™ and imposes limitations to the ā€˜kernel of truthā€™ hypothesis of trait evaluations

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Acoustic Changes during Passage Reading in Speakers with Parkinson\u27s Disease

    Get PDF
    Purpose: The purpose of this study was to evaluate speech changes in Parkinsonā€™s disease (PD) while reading a passage, using both local (i.e., segment level) and global (i.e., utterance level) acoustic measures. Methods: 20 speakers participated in the study (10 PD, 10 neurologically healthy controls). The speakers were asked to read The Caterpillar passage in a conversational mode. A total of five acoustic measures were included (local: vowel duration, Euclidean distance between corner vowels and schwa; global: articulation rate, F0/intensity range). These acoustic measures were compared between two sentences located in the two positions within the paragraph, initial and final. Results: The findings indicated (1) overall speech differences between the two groups such as increased vowel duration and reduced vowel contrast and (2) speech differences between the beginning and end of the passage such as increased articulation rate toward the end. In addition, the results revealed that unlike control speakers, speakers with PD did not show a greater F0 and intensity range in the end compared to the beginning of the passage, which points a limited capability of prosody modulations in PD and its apparent pattern toward the end of passage reading. Discussion: Findings of this study support the notion that within- or across-task acoustic variation should be considered in speech sampling in clinical practice and research
    • ā€¦
    corecore