431 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech

    No full text
    International audienceSpeech produced in noise (or Lombard speech) is characterized by increased vocal effort, but also by amplified lip gestures. The current study examines whether this enhancement of visible speech cues may be sought by the speaker, even unconsciously, in order to improve his visual intelligibility. One subject played an interactive game in a quiet situation and then in 85dB of cocktail-party noise, for three conditions of interaction: without interaction, in face-to-face interaction, and in a situation of audio interaction only. The audio signal was recorded simultaneously with articulatory movements, using 3D electromagnetic articulography. The results showed that acoustic modifications of speech in noise were greater when the interlocutor could not see the speaker. Furthermore, tongue movements that are hardly visible were not particularly amplified in noise. Lip movements that are very visible were not more enhanced in noise when the interlocutors could see each other. Actually, they were more enhanced in the situation of audio interaction only. These results support the idea that this speaker did not make use of the visual channel to improve his intelligibility, and that his hyper articulation was just an indirect correlate of increased vocal effort

    Speakers exhibit a multimodal Lombard effect in noise

    Get PDF
    In everyday conversation, we are often challenged with communicating in non-ideal settings, such as in noise. Increased speech intensity and larger mouth movements are used to overcome noise in constrained settings (the Lombard effect). How we adapt to noise in face-to-face interaction, the natural environment of human language use, where manual gestures are ubiquitous, is currently unknown. We asked Dutch adults to wear headphones with varying levels of multi-talker babble while attempting to communicate action verbs to one another. Using quantitative motion capture and acoustic analyses, we found that (1) noise is associated with increased speech intensity and enhanced gesture kinematics and mouth movements, and (2) acoustic modulation only occurs when gestures are not present, while kinematic modulation occurs regardless of co-occurring speech. Thus, in face-to-face encounters the Lombard effect is not constrained to speech but is a multimodal phenomenon where the visual channel carries most of the communicative burden

    Perceptual Calibration of F0 Production: Evidence from Feedback Perturbation

    Get PDF
    Hearing one’s own speech is important for language learning and maintenance of accurate articulation. For example, people with postlinguistically acquired deafness often show a gradual deterioration of many aspects of speech production. In this manuscript, data are presented that address the role played by acoustic feedback in the control of voice fundamental frequency (F0). Eighteen subjects produced vowels under a control ~normal F0 feedback! and two experimental conditions: F0 shifted up and F0 shifted down. In each experimental condition subjects produced vowels during a training period in which their F0 was slowly shifted without their awareness. Following this exposure to transformed F0, their acoustic feedback was returned to normal. Two effects were observed. Subjects compensated for the change in F0 and showed negative aftereffects. When F0 feedback was returned to normal, the subjects modified their produced F0 in the opposite direction to the shift. The results suggest that fundamental frequency is controlled using auditory feedback and with reference to an internal pitch representation. This is consistent with current work on internal models of speech motor control

    Physiological characteristics of dysphagia following thermal burn injury

    Get PDF
    The study aim was to document the acute physiological characteristics of swallowing impairment following thermal burn injury. A series of 19 participants admitted to a specialised burn centre with thermal burn injury were identified with suspected aspiration risk by a clinical swallow examination (CSE) conducted by a speech-language pathologist and referred to the study. Once medically stable, each then underwent more detailed assessment using both a CSE and fiberoptic evaluation of swallowing (FEES). FEES confirmed six individuals (32%) had no aspiration risk and were excluded from further analyses. Of the remaining 13, CSE confirmed that two had specific oral-phase deficits due to orofacial scarring and contractures, and all 13 had generalised oromotor weakness. FEES revealed numerous pharyngeal-phase deficits, with the major findings evident in greater than 50% being impaired secretion management, laryngotracheal edema, delayed swallow initiation, impaired sensation, inadequate movement of structures within the hypopharynx and larynx, and diffuse pharyngeal residue. Penetration and/or aspiration occurred in 83% (n = 10/12) of thin fluids trials, with a lack of response to the penetration/aspiration noted in 50% (n = 6/12 penetration aspiration events) of the cases. Most events occurred post swallow. Findings support the fact that individuals with dysphagia post thermal burn present with multiple risk factors for aspiration that appear predominantly related to generalised weakness and inefficiency and further impacted by edema and sensory impairments. Generalised oromotor weakness and orofacial contractures (when present) impact oral-stage swallow function. This study has identified a range of factors that may contribute to both oral- and pharyngeal-stage dysfunction in this clinical population and has highlighted the importance of using a combination of clinical and instrumental assessments to fully understand the influence of burn injury on oral intake and swallowing

    The impact of automatic exaggeration of the visual articulatory features of a talker on the intelligibility of spectrally distorted speech

    Get PDF
    Visual speech information plays a key role in supporting speech perception, especially when acoustic features are distorted or inaccessible. Recent research suggests that for spectrally distorted speech, the use of visual speech in auditory training improves not only subjects’ audiovisual speech recognition, but also their subsequent auditory-only speech recognition. Visual speech cues, however, can be affected by a number of facial visual signals that vary across talkers, such as lip emphasis and speaking style. In a previous study, we enhanced the visual speech videos used in perception training by automatically tracking and colouring a talker’s lips. This improved the subjects’ audiovisual and subsequent auditory speech recognition compared with those who were trained via unmodified videos or audio-only methods. In this paper, we report on two issues related to automatic exaggeration of the movement of the lips/ mouth area. First, we investigate subjects’ ability to adapt to the conflict between the articulation energy in the visual signals and the vocal effort in the acoustic signals (since the acoustic signals remained unexaggerated). Second, we have examined whether or not this visual exaggeration can improve the subjects’ performance of auditory and audiovisual speech recognition when used in perception training. To test this concept, we used spectrally distorted speech to train groups of listeners using four different training regimes: (1) audio only, (2) audiovisual, (3) audiovisual visually exaggerated, and (4) audiovisual visually exaggerated and lip-coloured. We used spectrally distorted speech (cochlear-implant-simulated speech) because the longer-term aim of our work is to employ these concepts in a training system for cochlear-implant (CI) users. The results suggest that after exposure to visually exaggerated speech, listeners had the ability to adapt alongside the conflicting audiovisual signals. In addition, subjects trained with enhanced visual cues (regimes 3 and 4) achieved better audiovisual recognition for a number of phoneme classes than those who were trained with unmodified visual speech (regime 2). There was no evidence of an improvement in the subsequent audio-only listening skills, however. The subjects’ adaptation to the conflicting audiovisual signals may have slowed down auditory perceptual learning, and impeded the ability of the visual speech to improve the training gains

    Speech modifications in interactive speech: Effects of age, sex and noise type

    Get PDF
    When attempting to maintain conversations in noisy communicative settings, talkers typically modify their speech to make themselves understood by the listener. In this study, we investigated the impact of background interference type and talker age on speech adaptations, vocal effort and communicative success. We measured speech acoustics (articulation rate, mid-frequency energy, fundamental frequency), vocal effort (correlation between mid-frequency energy and fundamental frequency) and task completion time in 114 participants aged 8–80 years carrying out an interactive problem-solving task in good and noisy listening conditions (quiet, non-speech noise, background speech). We found greater changes in fundamental frequency and mid-frequency energy in non-speech noise than in background speech and similar reductions in articulation rate in both. However, older participants (50+ years) increased vocal effort in both background interference types, whereas younger children (less than 13 years) increased vocal effort only in background speech. The presence of background interference did not lead to longer task completion times. These results suggest that when the background interference involves a higher cognitive load, as in the case of other speech of other talkers, children and older talkers need to exert more vocal effort to ensure successful communication. We discuss these findings within the communication effort framework. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part II)’

    Perceptual Calibration of F0 Production: Evidence from Feedback Perturbation

    Get PDF
    Hearing one’s own speech is important for language learning and maintenance of accurate articulation. For example, people with postlinguistically acquired deafness often show a gradual deterioration of many aspects of speech production. In this manuscript, data are presented that address the role played by acoustic feedback in the control of voice fundamental frequency (F0). Eighteen subjects produced vowels under a control ~normal F0 feedback! and two experimental conditions: F0 shifted up and F0 shifted down. In each experimental condition subjects produced vowels during a training period in which their F0 was slowly shifted without their awareness. Following this exposure to transformed F0, their acoustic feedback was returned to normal. Two effects were observed. Subjects compensated for the change in F0 and showed negative aftereffects. When F0 feedback was returned to normal, the subjects modified their produced F0 in the opposite direction to the shift. The results suggest that fundamental frequency is controlled using auditory feedback and with reference to an internal pitch representation. This is consistent with current work on internal models of speech motor control

    Measuring communication difficulty through effortful speech production during conversation

    Get PDF
    This study describes the use of a novel conversation elicitation framework to collect fluent, dynamic conversational speech in simulated realistic acoustic environments of varying complexities. Our aim is to quantify speech modifications during conversation, which characterize effortful speech, as a function of the difficulty of the acoustic environment. We report speech production data at the acoustic-phonetic level (vocal level, mid-frequency emphasis, formant frequencies and formant bandwidths), as well as at higher levels of analysis including utterance duration and turn overlap durations. The sensitivity and test-retest reliability of different speech production measures to changes in acoustic environment are reported. We propose a multi-dimensional view of effortful speech modifications. Considering speech modifications across different linguistic levels provides a richer view of the effects of the acoustic environment on communication as compared with consideration of low-level acoustic-phonetic markers alone. Finally, we describe how consideration of speech modification data may form the basis of a measure of communication effort with scope for the assessment of the impacts of hearing impairment and amplification upon ease of spoken communication
    corecore