5 research outputs found

    Speech with pauses sounds deceptive to listeners with and without hearing impairment

    Get PDF
    Purpose: Communication is as much persuasion as it is the transfer of information. This creates a tension between the interests of the speaker and those of the listener as dishonest speakers naturally attempt to hide deceptive speech, and listeners are faced with the challenge of sorting truths from lies. Hearing impaired listeners in particular may have differing levels of access to the acoustical cues that give away deceptive speech. A greater tendency towards speech pauses has been hypothesised to result from the cognitive demands of lying convincingly. Higher vocal pitch has also been hypothesised to mark the increased anxiety of a dishonest speaker.// Method: listeners with or without hearing impairments heard short utterances from natural conversations some of which had been digitally manipulated to contain either increased pausing or raised vocal pitch. Listeners were asked to guess whether each statement was a lie in a two alternative forced choice task. Participants were also asked explicitly which cues they believed had influenced their decisions.// Results: Statements were more likely to be perceived as a lie when they contained pauses, but not when vocal pitch was raised. This pattern held regardless of hearing ability. In contrast, both groups of listeners self-reported using vocal pitch cues to identify deceptive statements, though at lower rates than pauses.// Conclusions: Listeners may have only partial awareness of the cues that influence their impression of dishonesty. Hearing impaired listeners may place greater weight on acoustical cues according to the differing degrees of access provided by hearing aids./

    Overrated gaps: Inter-speaker gaps provide limited information about the timing of turns in conversation

    Get PDF
    Corpus analyses have shown that turn-taking in conversation is much faster than laboratory studies of speech planning would predict. To explain fast turn-taking, Levinson and Torreira (2015) proposed that speakers are highly proactive: They begin to plan a response to their interlocutor's turn as soon as they have understood its gist, and launch this planned response when the turn-end is imminent. Thus, fast turn-taking is possible because speakers use the time while their partner is talking to plan their own utterance. In the present study, we asked how much time upcoming speakers actually have to plan their utterances. Following earlier psycholinguistic work, we used transcripts of spoken conversations in Dutch, German, and English. These transcripts consisted of segments, which are continuous stretches of speech by one speaker. In the psycholinguistic and phonetic literature, such segments have often been used as proxies for turns. We found that in all three corpora, large proportions of the segments comprised of only one or two words, which on our estimate does not give the next speaker enough time to fully plan a response. Further analyses showed that speakers indeed often did not respond to the immediately preceding segment of their partner, but continued an earlier segment of their own. More generally, our findings suggest that speech segments derived from transcribed corpora do not necessarily correspond to turns, and the gaps between speech segments therefore only provide limited information about the planning and timing of turns

    Segmentation and Intonation in Childhood Apraxia of Speech

    Get PDF
    Childhood apraxia of speech (CAS) is a motor speech disorder that affects the programming of spatial and temporal parameters for speech patterns, characterized by sound distortions, segmented units, and deficits with lexical stress. CAS has notable increases in the length of time between speech segments and within syllables than do children with phonological impairments or who are developing typically. This segmentation may impact prosody at the lexical level. Prosody also includes declination of the fundamental frequency (F0) and reset at the intonational level, impacting the intelligibility of speech production. This study assessed segmentation and intonational effects on prosody across an entire utterance for 11 children with CAS and 10 typically-developing children (TD) aged 5-11-years-old. Acoustic analyses of real and non-word multisyllabic words, paired with a carrier phrase of 3-4 words, were conducted for the average inter-segment duration between and within words (ms) and average slope of F0. Stimuli were generated from Treating Establishment of Motor Program Organization (TEMPO), which targets motor speech errors in CAS (Miller et al., 2018). The current study provides a TD comparison to the CAS group from the TEMPO study prior to treatment. Results showed CAS participants produced significantly longer inter-segment durations between words and within words. A correlation analysis concluded a strong positive relationship between inter-segment duration and number of words in the sentence. These data found F0 declination over utterances for both groups, with no detectable difference between groups. F0 change over target words did not show notable declination differences between groups. Further correlation testing suggests that as between-word duration increases, F0 regression slope over utterances flattens. Comparing speech production patterns in CAS with TD children pre- and post-treatment will better establish treatment efficacy in improving the communication of children with CAS. Further data from additional participants will better differentiate prosodic intonation between TD children and children with CAS

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems
    corecore