3,746 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    Effects of acoustic features modifications on the perception of dysarthric speech - preliminary study (pitch, intensity and duration modifications)

    Get PDF
    Marking stress is important in conveying meaning and drawing listener’s attention to specific parts of a message. Extensive research has shown that healthy speakers mark stress using three main acoustic cues; pitch, intensity, and duration. The relationship between acoustic and perception cues is vital in the development of a computer-based tool that aids the therapists in providing effective treatment to people with Dysarthria. It is, therefore, important to investigate the acoustic cues deficiency in dysarthric speech and the potential compensatory techniques needed for effective treatment. In this paper, the relationship between acoustic and perceptive cues in dysarthric speech are investigated. This is achieved by modifying stress marked sentences from 10 speakers with Ataxic dysarthria. Each speaker produced 30 sentences using the 10 Subject-Verb-Object-Adjective (SVOA) structured sentences across three stress conditions. These stress conditions are stress on the initial (S), medial (O) and final (A) target words respectively. To effectively measure the deficiencies in Dysarthria speech, the acoustic features (pitch, intensity, and duration) are modified incrementally. The paper presents the techniques involved in the modification of these acoustic features. The effects of these modifications are analysed based on steps of 25% increments in pitch, intensity and duration. For robustness and validation, 50 untrained listeners participated in the listening experiment. The results and the relationship between acoustic modifications (what is measured) and perception (what is heard) in Dysarthric speech are discussed

    Automatic Feedback for L2 Prosody Learning

    Get PDF
    International audienceWe have designed automatic feedback for the realisation of the prosody of a foreign language. Besides classical F0 displays, two kinds of feedback are provided to learners, each of them based upon a comparison between a reference and the learner's production. The first feedback, a diagnosis, provided both in the form of a short text and visual displays such as arrows, comes from an acoustic evaluation of the learner's realisation; it deals with two prosodic cues: the melodic curve, and phoneme duration. The second feedback is perceptual and consists in a replacement of the learner's prosodic cues (duration and F0) by those of the reference. A pilot experiment has been undertaken to test the immediate impact of the "advanced" feedback proposed here. We have chosen to test the production of English lexical accent in isolated words by French speakers. It shows that feedback based upon diagnosis and speech modification enables French learners with a low production level to improve their realisations of English lexical accents more than (simple) auditory feedback. On the contrary, for advanced learners involved in this study, auditory feedback appears to be as efficient as more elaborated feedback
    corecore