117 research outputs found

    Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions

    Get PDF
    This study investigated whether speech produced in spontaneous interactions when addressing a talker experiencing actual challenging conditions differs in acoustic-phonetic characteristics from speech produced: (a) with communicative intent under more ideal conditions, and (b) without communicative intent under imaginary challenging conditions (read, clear speech). It also investigated whether acoustic-phonetic modifications made to counteract the effects of a challenging listening condition are tailored to the condition under which communication occurs. 40 talkers were recorded in pairs while engaged in ā€˜spot the differenceā€™ picture tasks in good and challenging conditions. In the challenging conditions, one talker heard the other: (1) via a three-channel noise vocoder (VOC); (2) with simultaneous babble noise (BABBLE). Read, clear speech showed more extreme changes in median F0, F0 range and speaking rate than speech produced to counter the effects of a challenging listening condition. In the VOC condition, where F0 and intensity enhancements are unlikely to aid intelligibility, talkers did not change their F0 median and range; mean energy and vowel F1 increased less than in the BABBLE condition. This suggests that speech production is listener-focused, and that talkers modulate their speech according to their interlocutorsā€™ needs, even when not directly experiencing the challenging listening condition

    Visual Speech Enhancement and its Application in Speech Perception Training

    Get PDF
    This thesis investigates methods for visual speech enhancement to support auditory and audiovisual speech perception. Normal-hearing non-native listeners receiving cochlear implant (CI) simulated speech are used as ā€˜proxyā€™ listeners for CI users, a proposed user group who could benefit from such enhancement methods in speech perception training. Both CI users and non-native listeners share similarities with regards to audiovisual speech perception, including increased sensitivity to visual speech cues. Two enhancement methods are proposed: (i) an appearance based method, which modifies the appearance of a talkerā€™s lips using colour and luminance blending to apply a ā€˜lipstick effectā€™ to increase the saliency of mouth shapes; and (ii) a kinematics based method, which amplifies the kinematics of the talkerā€™s mouth to create the effect of more pronounced speech (an ā€˜exaggeration effectā€™). The application that is used to test the enhancements is speech perception training, or audiovisual training, which can be used to improve listening skills. An audiovisual training framework is presented which structures the evaluation of the effectiveness of these methods. It is used in two studies. The first study, which evaluates the effectiveness of the lipstick effect, found a significant improvement in audiovisual and auditory perception. The second study, which evaluates the effectiveness of the exaggeration effect, found improvement in the audiovisual perception of a number of phoneme classes; no evidence was found of improvements in the subsequent auditory perception, as audiovisual recalibration to visually exaggerated speech may have impeded learning when used in the audiovisual training. The thesis also investigates an example of kinematics based enhancement which is observed in Lombard speech, by studying the behaviour of visual Lombard phonemes in different contexts. Due to the lack of suitable datasets for this analysis, the thesis presents a novel audiovisual Lombard speech dataset recorded under high SNR, which offers two, fixed head-pose, synchronised views of each talker in the dataset

    The impact of automatic exaggeration of the visual articulatory features of a talker on the intelligibility of spectrally distorted speech

    Get PDF
    Visual speech information plays a key role in supporting speech perception, especially when acoustic features are distorted or inaccessible. Recent research suggests that for spectrally distorted speech, the use of visual speech in auditory training improves not only subjectsā€™ audiovisual speech recognition, but also their subsequent auditory-only speech recognition. Visual speech cues, however, can be affected by a number of facial visual signals that vary across talkers, such as lip emphasis and speaking style. In a previous study, we enhanced the visual speech videos used in perception training by automatically tracking and colouring a talkerā€™s lips. This improved the subjectsā€™ audiovisual and subsequent auditory speech recognition compared with those who were trained via unmodified videos or audio-only methods. In this paper, we report on two issues related to automatic exaggeration of the movement of the lips/ mouth area. First, we investigate subjectsā€™ ability to adapt to the conflict between the articulation energy in the visual signals and the vocal effort in the acoustic signals (since the acoustic signals remained unexaggerated). Second, we have examined whether or not this visual exaggeration can improve the subjectsā€™ performance of auditory and audiovisual speech recognition when used in perception training. To test this concept, we used spectrally distorted speech to train groups of listeners using four different training regimes: (1) audio only, (2) audiovisual, (3) audiovisual visually exaggerated, and (4) audiovisual visually exaggerated and lip-coloured. We used spectrally distorted speech (cochlear-implant-simulated speech) because the longer-term aim of our work is to employ these concepts in a training system for cochlear-implant (CI) users. The results suggest that after exposure to visually exaggerated speech, listeners had the ability to adapt alongside the conflicting audiovisual signals. In addition, subjects trained with enhanced visual cues (regimes 3 and 4) achieved better audiovisual recognition for a number of phoneme classes than those who were trained with unmodified visual speech (regime 2). There was no evidence of an improvement in the subsequent audio-only listening skills, however. The subjectsā€™ adaptation to the conflicting audiovisual signals may have slowed down auditory perceptual learning, and impeded the ability of the visual speech to improve the training gains

    Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions

    Get PDF
    In many situations it is necessary to produce speech in ā€˜adverse conditionsā€™: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ā€˜tailoredā€™ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    On the interplay between speech perception and production: insights from research and theories

    Get PDF
    The study of spoken communication has long been entrenched in a debate surrounding the interdependence of speech production and perception. This mini review summarizes findings from prior studies to elucidate the reciprocal relationships between speech production and perception. We also discuss key theoretical perspectives relevant to speech perception-production loop, including hyper-articulation and hypo-articulation (H&H) theory, speech motor theory, direct realism theory, articulatory phonology, the Directions into Velocities of Articulators (DIVA) and Gradient Order DIVA (GODIVA) models, and predictive coding. Building on prior findings, we propose a revised auditory-motor integration model of speech and provide insights for future research in speech perception and production, focusing on the effects of impaired peripheral auditory systems

    Design and Development of a Spanish Hearing Test for Speech in Noise (PAHRE)

    Get PDF
    [Abstract] Background: There are few hearing tests in Spanish that assess speech discrimination in noise in the adult population that take into account the Lombard effect. This study presents the design and development of a Spanish hearing test for speech in noise (Prueba Auditiva de Habla en Ruido en EspaƱol (PAHRE) in Spanish). The pattern of the Quick Speech in Noise test was followed when drafting sentences with five key words each grouped in lists of six sentences. It was necessary to take into account the differences between English and Spanish. Methods: A total of 61 people (24 men and 37 women) with an average age of 46.9 (range 18ā€“84 years) participated in the study. The work was carried out in two phases. In the first phase, a list of Spanish sentences was drafted and subjected to a familiarity test based on the semantic and syntactic characteristics of the sentences; as a result, a list of sentences was selected for the final test. In the second phase, the selected sentences were recorded with and without the Lombard effect, the equivalence between both lists was analysed, and the test was applied to a first reference population. Results: The results obtained allow us to affirm that it is representative of the Spanish spoken in its variety in peninsular Spain. Conclusions: In addition, these results point to the usefulness of the PAHRE test in assessing speech in noise by maintaining a fixed speech intensity while varying the intensity of the multi-speaker background noise. The incorporation of the Lombard effect in the test shows discrimination differences with the same signal-to-noise ratio compared to the test without the Lombard effect

    The influence of channel and source degradations on intelligibility and physiological measurements of effort

    Get PDF
    Despite the fact that everyday listening is compromised by acoustic degradations, individuals show a remarkable ability to understand degraded speech. However, recent trends in speech perception research emphasise the cognitive load imposed by degraded speech on both normal-hearing and hearing-impaired listeners. The perception of degraded speech is often studied through channel degradations such as background noise. However, source degradations determined by talkersā€™ acoustic-phonetic characteristics have been studied to a lesser extent, especially in the context of listening effort models. Similarly, little attention has been given to speaking effort, i.e., effort experienced by talkers when producing speech under channel degradations. This thesis aims to provide a holistic understanding of communication effort, i.e., taking into account both listener and talker factors. Three pupillometry studies are presented. In the first study, speech was recorded for 16 Southern British English speakers and presented to normal-hearing listeners in quiet and in combination with three degradations: noise-vocoding, masking and time-compression. Results showed that acoustic-phonetic talker characteristics predicted intelligibility of degraded speech, but not listening effort, as likely indexed by pupil dilation. In the second study, older hearing-impaired listeners were presented fast time-compressed speech under simulated room acoustics. Intelligibility was kept at high levels. Results showed that both fast speech and reverberant speech were associated with higher listening effort, as suggested by pupillometry. Discrepancies between pupillometry and perceived effort ratings suggest that both methods should be employed in speech perception research to pinpoint processing effort. While findings from the first two studies support models of degraded speech perception, emphasising the relevance of source degradations, they also have methodological implications for pupillometry paradigms. In the third study, pupillometry was combined with a speech production task, aiming to establish an equivalent to listening effort for talkers: speaking effort. Normal-hearing participants were asked to read and produce speech in quiet or in the presence of different types of masking: stationary and modulated speech-shaped noise, and competing-talker masking. Results indicated that while talkers acoustically enhance their speech more under stationary masking, larger pupil dilation associated with competing-speaker masking reflected higher speaking effort. Results from all three studies are discussed in conjunction with models of degraded speech perception and production. Listening effort models are revisited to incorporate pupillometry results from speech production paradigms. Given the new approach of investigating source factors using pupillometry, methodological issues are discussed as well. The main insight provided by this thesis, i.e., the feasibility of applying pupillometry to situations involving listener and talker factors, is suggested to guide future research employing naturalistic conversations

    Manipulation of Auditory Feedback in Individuals with Normal Hearing and Hearing Loss

    Get PDF
    Auditory feedback, the hearing of oneā€™s own voice, plays an important role in the detection of speech errors and the regulation of speech production. The limited auditory cues available with a hearing loss can reduce the ability of individuals with hearing loss to use their auditory feedback. Hearing aids are a common assistive device that amplifies inaudible sounds. Hearing aids can also change auditory feedback through digital signal processing, such as frequency lowering. Frequency lowering moves high frequency information of an incoming auditory stimulus into a lower frequency region where audibility may be better. This can change how speech sounds are perceived. For example, the high frequency information of /s/ is moved closer to the lower frequency area of /Źƒ/. As well, real-time signal processing in a laboratory setting can also manipulate various aspects of speech cues, such as intensity and vowel formants. These changes in auditory feedback may result in changes in speech production as the speech motor control system may perceive these perturbations as speech errors. A series of experiments were carried out to examine changes in speech production as a result of perturbations in the auditory feedback in individuals with normal hearing and hearing loss. Intensity and vowel formant perturbations were conducted using real-time signal processing in the laboratory. As well, changes in speech production were measured using auditory feedback that was processed with frequency lowering technology in hearing aids. Acoustic characteristics of intensity of vowels, sibilant fricatives, and first and second formants were analyzed. The results showed that the speech motor control system is sensitive to changes in auditory feedback because perturbations in auditory feedback can result in changes in speech production. However, speech production is not completely controlled by auditory feedback and other feedback systems, such as the somatosensory system, are also involved. An impairment of the auditory system can reduce the ability of the speech motor control system to use auditory feedback in the detection of speech errors, even when aided with hearing aids. Effects of frequency lowering in hearing aids on speech production depend on the parameters used and acclimatization time
    • ā€¦
    corecore