1,578 research outputs found

    Generating expressive speech for storytelling applications

    Get PDF
    Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applications aimed at children. Based on an analysis of human storytellers' speech, we designed and implemented a set of prosodic rules for converting "neutral" speech, as produced by a text-to-speech system, into storytelling speech. An evaluation of our storytelling speech generation system showed encouraging results

    The Neurocognition of Prosody

    Get PDF
    Prosody is one of the most undervalued components of language, despite its fulfillment of manifold purposes. It can, for instance, help assign the correct meaning to compounds such as “white house” (linguistic function), or help a listener understand how a speaker feels (emotional function). However, brain-based models that take into account the role prosody plays in dynamic speech comprehension are still rare. This is probably due to the fact that it has proven difficult to fully denote the neurocognitive architecture underlying prosody. This review discusses clinical and neuroscientific evidence regarding both linguistic and emotional prosody. It will become obvious that prosody processing is a multistage operation and that its temporally and functionally distinct processing steps are anchored in a functionally differentiated brain network

    Unusual Prosodic Descriptors in Young, Verbal Children with Autism Spectrum Disorders

    Get PDF
    This study aimed to determine which prosodic descriptors best characterized the speech of children with autism spectrum disorders (ASD) and whether these descriptors (e.g., sing-song and monotone) are acoustically different. Two listeners\u27 auditory perceptions of the speech of the children with ASD and the pitch of the speech samples were analyzed. The results suggest that individual children are characterized by a variety of prosodic descriptors. Some thought groups were described as both sing-song and monotone, however, most children appear to be either more monotone or more sing-song. Furthermore, the subjective and acoustic data suggest a strong relationship between atypical intonation and sing-song perceptions as well as atypical rhythm and monotone perceptions. Implications for an earlier diagnosis of ASD and for the development of therapy tasks to target these deficits are discussed

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Does speech prosody matter in health communication? Evidence from native and non-native English speaking medical students in a simulated clinical interaction

    Get PDF
    The impact of the UK’s multilingual and multicultural society today can be seen in its healthcare services and have contributed towards shaping communication skills training as a core part of the UK undergraduate medical curriculum. NHS complaints statistics involving perceived staff attitudes have remained high, despite extensive communication skills training. Furthermore, foreign doctors have received a higher proportion of complaints than UK doctors. Finally, how linguistic and social factors shape the conveyance and perception of attitudes related to professionalism in medical communication remains poorly understood. The ultimate aim of this study was to ascertain if speech prosody contributes to the perception of professionalism in medical communication. Research questions on the role of speech prosody in conveying professional attitudes in medical communication, the prosodic differences between native and non-native English speaking medical students in a simulated clinical interaction, and the influence of prosodic features on listeners’ perceptions of professional attitudes were addressed. A set of acoustic parameters representing the speech prosody of native and non-native medical students in the simulated clinical setting was analysed. A perceptual experiment was then carried out to investigate the factors affecting perceived professionalism in extracts of the analysed simulated clinical interaction. The examined acoustic parameters were found to be sensitive to the English language background and the task within the simulated consultation. Interestingly, the attitudinal information associated with some of these acoustic parameters were perceived by listeners and were reflected by higher professional scale scores in the perceptual experiment, even after adjusting for the English language background. The factors of training level and consultation task also emerged to be affecting professional scale scores. Initial findings have confirmed that speech prosody plays a role in terms of contributing towards the perception of professionalism in medical communication. Incorporating how messages are delivered to patients into current models of communication skills training may have positive outcomes

    Speech intelligibility and prosody production in children with cochlear implants

    Get PDF
    Objectives—The purpose of the current study was to examine the relation between speech intelligibility and prosody production in children who use cochlear implants. Methods—The Beginner\u27s Intelligibility Test (BIT) and Prosodic Utterance Production (PUP) task were administered to 15 children who use cochlear implants and 10 children with normal hearing. Adult listeners with normal hearing judged the intelligibility of the words in the BIT sentences, identified the PUP sentences as one of four grammatical or emotional moods (i.e., declarative, interrogative, happy, or sad), and rated the PUP sentences according to how well they thought the child conveyed the designated mood. Results—Percent correct scores were higher for intelligibility than for prosody and higher for children with normal hearing than for children with cochlear implants. Declarative sentences were most readily identified and received the highest ratings by adult listeners; interrogative sentences were least readily identified and received the lowest ratings. Correlations between intelligibility and all mood identification and rating scores except declarative were not significant. Discussion—The findings suggest that the development of speech intelligibility progresses ahead of prosody in both children with cochlear implants and children with normal hearing; however, children with normal hearing still perform better than children with cochlear implants on measures of intelligibility and prosody even after accounting for hearing age. Problems with interrogative intonation may be related to more general restrictions on rising intonation, and th

    Discourse Intonation

    Get PDF
    This paper addresses the different notions of „discourse‟ that underlie various studies of „discourse prosody.‟ It describes the prosodic resources available to speakers to convey different kinds of discourse meaning. In so doing, I distinguish between discourse as structure –information structure and text structure, discourse as language in use – pragmatics and conversation, and discourse as a reflection of society – power and persuasion. In addressing the final aspect of discourse – its ability to manipulate and persuade, I recall the classical origins of rhetoric and revisit the all-important notion of „delivery‟

    How tone, intonation and emotion shape the development of infants' fundamental frequency perception

    Get PDF
    Fundamental frequency (ƒ0), perceived as pitch, is the first and arguably most salient auditory component humans are exposed to since the beginning of life. It carries multiple linguistic (e.g., word meaning) and paralinguistic (e.g., speakers’ emotion) functions in speech and communication. The mappings between these functions and ƒ0 features vary within a language and differ cross-linguistically. For instance, a rising pitch can be perceived as a question in English but a lexical tone in Mandarin. Such variations mean that infants must learn the specific mappings based on their respective linguistic and social environments. To date, canonical theoretical frameworks and most empirical studies do not view or consider the multi-functionality of ƒ0, but typically focus on individual functions. More importantly, despite the eventual mastery of ƒ0 in communication, it is unclear how infants learn to decompose and recognize these overlapping functions carried by ƒ0. In this paper, we review the symbioses and synergies of the lexical, intonational, and emotional functions that can be carried by ƒ0 and are being acquired throughout infancy. On the basis of our review, we put forward the Learnability Hypothesis that infants decompose and acquire multiple ƒ0 functions through native/environmental experiences. Under this hypothesis, we propose representative cases such as the synergy scenario, where infants use visual cues to disambiguate and decompose the different ƒ0 functions. Further, viable ways to test the scenarios derived from this hypothesis are suggested across auditory and visual modalities. Discovering how infants learn to master the diverse functions carried by ƒ0 can increase our understanding of linguistic systems, auditory processing and communication functions
    corecore